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ABSTRACT 


The  applications  of  silicon  compilers,  and  the  design 
methodology  of  the  Genesil  Silicon  Compiler  are  described. 
The  performance  of  Genesil  system  library  adders  and 
multipliers  are  compared  with  comparable  custom  pipelined 
adder  and  multiplier  circuits  built  on  the  Genesil  Silicon 
Compiler.  High  performance  pipeline  methods  are  discussed. 
The  appendix  is  a  tutorial  illustrating  a  Genesil  system, 
hierarchical  top-down  chip  design,  including  simulation  and 
timing  analysis  procedures. 
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I. 


INTRODUCTION 


As  integrated  circuits  (I.C.)  grow  increasingly  complex, 
new  methods  are  needed  to  manage  the  complexity,  cost,  and 
time  consumed  when  designing  and  testing  the  desired  I.C. 
system.  There  is  a  demand  for  a  quick,  and  relatively 
economical  design  process  to  precede  the  actual  chip  layout. 

To  meet  this  criteria,  the  methodology  must  have  the 
capability  of  design  at  higher  levels  to  specify,  design,  and 
simulate  the  desired  circuit.  A  state-of-the-art  solution  to 
this  requirement  is  the  silicon  compiler. 

Loosely  defined,  a  silicon  compiler  is  a  system  which 
generates  I.C.  layouts  from  high-level  descriptions. 
Originally,  silicon  compilation  referred  to  a  design 
methodology  in  lieu  of  a  system  or  set  of  processes.  The 
silicon  compiler  was  analogous  to  compilation  of  machine  cede 
from  a  high-level  language.  An  object  was  a  graphic  image 
rather  than  a  block  of  executable  code.  Geometries  of  the 
desired  chip  were  constructed  the  same  way  as  machine  code , 
i.e.,  compiled  from  high-level  languages.  Early  silicon 
compilers  use  "C"  and  "LISP"  compilers.  [Ref.  1] 

The  latest  silicon  compilers  are  complete  design  systems. 
Compilation  is  used  as  one  mechanism  for  overall  chip  design. 
State-of-the-art  silicon  compilers  have  computer  aided 
engineering  (CAE)  /  computer  aided  design  (CAD)  as  in  the 
past,  but  also  include  geometry  editing,  symbolic  editing. 


simulation,  automatic  placement,  automatic  routing, 
compaction,  and  design  rule  checking  [Ref.  1] . 

Each  state  of  a  complex  custom  I.C.  system  design,  from, 
concept  to  silicon  testing,  requires  a  team  of  experts  in  each 
field.  Each  team  normally  is  not  an  expert  in  the  other  areas 
of  chip  development.  Team  expertise  is  a  necessary  condition 
in  the  fields  of  requirements  generation,  logic 
implementation,  circuit  simulation,  chip  layout,  and  testing. 
Chip  level  silicon  compilation  now  allows  a  systems 
engineering  to  design  Very  Large  Scale  Integrated  (VLSI) 
chips.  With  a  silicon  compiler,  the  design  is  accomplished  by 
using  a  top-down,  hierarchical  design  methodology  starting 
with  a  partitioned  chip  set,  proceeding  down  to  individual 
chips,  modules,  and  finally  blocks.  The  blocks,  or  bottom- 
level  design  elements,  include  various  types  of  logic  blocks 
including  ALL”  s .  PLA's.  RAM's.  ROM's,  multipliers,  and 
inverters  [P.ef.  2). 

Generally,  far  less  time  is  required  to  design  a  circuit 
with  a  silicon  compiler  than  is  necessary  for  a  comparable 
manual/CAO  design  method  using  graphic  layout  tools.  The 
silicon  compiler  makes  possible  the  rapid,  real  time 
development  and  testing  of  a  system.  This  is  advantageous  for 
designing  and  producing  relatively  small  numbers  of  chips. 

This  is  especially  attractive  for  military  applications  where 
small  numbers  of  chips  are  required  (hundreds  and  thousands 
vs.  millions'  and  a  rapid  turnaround  time  is  desired  [Ref.  3'. 


Reference  [1]  contains  a  directory  with  capability 
comparisons  of  the  silicon  compilers  currently  available  from 
commercial  sources.  The  most  notable  systems  observed  in  the 
directory  concerning  flexibility  and  overall  performance  were 
the  Concorde  Silicon  Compiler,  made  by  the  Seattle  Silicon 
Corp.  and  the  Genesil  Silicon  Compiler  produced  by  Silicon 
Compiler  Corp. 

Currently,  the  Naval  Postgraduate  School  has  the 
capability  of  VLSI  design  using  full  custom  methods,  the 
MacPitts  Silicon  Compiler,  and  the  Genesil  Silicon  Compiler. 
Both  the  full  custom  and  MacPitts  methods  depend  on  separate . 
time  consuming,  programming  for  simulation  and  timing  analysis 
of  a  VLSI  chip.  In  Genesil.  simulation  and  timing  analysis 
are  integrated  into  the  system. 

Chapter  II  describes  the  Genesil  System's  stand  alone 

capabilities  for  the  design  of  a  VLSI  system.  Chapter  III 
briefly  describes  system  pipelining  theory,  a  comparison  cf 
Genesil  library  versus  custom  adders,  followed  by  the  design 
and  performance  results  of  a  pipelined  16  bit  adder  built  on 
the  Genesil  Silicon  Compiler.  Chapter  IV  contains  performance 
comparisons  of  Genesil  library  multipliers  versus  custom 
multipliers,  followed  by  the  design  and  performance  results  of 
a  custom.  4  bit  pipelined  Wallace  Tree  structured  multiplier, 
concluding  with  the  design  and  performance  results  of  a  custom 
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pipelined  16  bit  parallel  multiplier.  The  Appendix  contains  a 
tutorial  for  a  top-down  VLSI  chip  design  for  the  Genesil 
Silicon  Compiler. 
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A. 


II.  GENESIL  SILICON  COMPILER 

INTRODUCTION 

The  Genesil  Silicon  Compiler  is  based  on  silicon 
compilation,  which  is  an  Application  Specific  Integrated 
Circuit  (ASIC)  design  method.  ASIC  design  methodology  also 
includes  full  custom,  gate  array,  and  standard  cell  methods. 

Full  custom  design  is  accomplished  by  a  team  of  IC  experts, 
whereas  gate  array,  standard  cell,  and  silicon  compilation  are 
based  on  the  prerise  that  the  designer  is  not  an  IC  expert 
[Ref .  4? . 

B.  ASIC  DESIGN 

Full  custom  design  is  normally  used  by  IC  manufacturers 
producing  vast  quantities  (millions)  of  standard  off-the-shelf 
type  chips  such  as  microprocessors .  The  chip  is  ncrm.aLLy 
very  dense  with  a  full  set  cf  masks  and  customized  prche  c arcs 
for  production  tests  since  the  designer  and  user  are  rest 
lively  net  the  same  [Pef  .  2’.  Full  custom  design  is  tire 
consuming  and  expensive,  which  can  be  attributed  tc  the 
comp1 exi ty  of  the  design  parameters  for  high  density  chips. 

Design  parameters,  at  the  full  custom  level,  are  a  constant 
tradeoff  involving  performance  (speed,  power,  function),  die 
size,  ease  of  test  generation,  and  testability  [Ref.  5[  . 

Gate  array  design  is  accomplished  by  interconnecting  the  < 

appropriate  rows  and  columns  of  transistors  with  metal  layers 
defining  the  circuit  fre-  appropriate  netlist  libraries.  The 

« 
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array  is  prefabricated  and  a  circuit  design  is  "fitted”  to  the 
array.  Processing  time  is  low,  but  circuit  density  is  alsc 
low.  Gate  array  vendors  provide  macros  containing  predefined 
patterns  to  form  SSI  circuits  such  as  NAND  and  NOR  gates 
[Ref. 4].  This  presents  problems  when  attempting  to  translate 
a  high  level  specification  from  one  vendor  to  another. 

Standard  cell  design  is  based  upon  the  same  methodology 
as  the  gate  array  design  except  it  differs  in  the  manufacture 
cycle.  The  gate  array  is  a  pre-manuf actured  wafer  to  which 
metal  is  added  to  form,  the  IC.  The  standard  cell  does  not 
have  pre-defined  transistor  locations.  The  manufacturing 
process  is  sirilar  to  full  custom,  requiring  all  layers  to  be 
created.  This  does  result,  however,  in  a  more  dense  circuit 
than  that  produced  by  the  gate  array  method. 

Silicon  compilation  produces  a  circuit  which  is  very 
similar  tc  a  full  custom,  design  by  synthesizing  the  circuit 
with  a  top-down,  hierarchical  design  methodology  consisting  cf 
chip  sets,  individual  chips,  n.odules  and  blocks  [Ref.  5]. 
Silicon  corr.pil  at  ion  provides  the  interface  between  high  level 
design  specifications  and  a  variety  of  design  tools  which 
produce  efficient  IC  layouts. 

C.  GENESIL  SYSTEM  DESCRIPTION 

The  Genesil  Silicon  Compiler  System  is  a  design 
automation  system  which  provides  the  user  with  the  capability 
of  designing  VLSI  circuits  from  high  level  system  description 
to  manufacture  tapecut  by  producing  the  IC  circuits  from. 
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architectural  descriptions.  The  system,  is  composed  of  menus, 
commands,  and  forms  used  in  the  following  activities 
descriptions.  The  system  uses  the  UNIX  Operating  System.  A 
Genesil  System  Overview  is  shown  in  Figure  2.1  [Ref.  6:p. 

2.3]  . 
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1 .  Methodology 

Prior  to  beginning  a  Genesil  session,  the  user  should 
pre-plan  all  functional  and  performance  specif ications .  With 
this  initial  plan,  the  user  is  able  to  rapidly  observe 
exploratory  ("first  cut")  designs  of  the  required 
specifications.  After  as  many  alterations  as  desired  or 
required,  the  detailed  design  can  be  completed  to  include 
simulation  and  timing  analysis.  Next,  the  physical  design 
process  is  "invoked"  by  using  the  Floor  planning  feature. 

Once  floor  planned,  verification  is  again  conducted  by 
functional  simulation  for  logic,  and  timing  analysis  for 
performance.  The  chip  is  then  ready  for  manufacture 
interface.  The  Genesil  methodology  is  illustrated  in  Figure 
2.2  [Ref.  6 : p .  2.8]. 
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Figure  2.2  Genesil  Methodology 


2 .  Design 

Genesil  is  an  object  oriented  system  which  uses  a 
hierarchical  pathname  system  based  on  the  UNIX  operating 
system  pathnames.  Objects  are  selected,  attached,  detached, 
moved,  and  removed  from  the  user's  account.  Objects  include 
blocks,  modules,  chips,  and  chip-sets. 

Blocks  are  the  lowest  level  objects  in  the  Genesil 
System  object  hierarchy.  Blocks  are  created  by  the  system 
block  generator  as  directed  by  the  user's  functional 
specifications.  There  are  three  types  of  blocks,  independent 
blocks,  data-path  blocks,  and  random  logic  blocks. 

Independent  blocks  include  complex  stand  alone  blocks  of  logic 
such  as  ROM's  and  PLA ' s .  Data-path  blocks  are  designed 
specifically  for  functions  that  manipulate  parallel  data. 
Random,  logic  blocks  contain  user  specified  gate  level  logic. 
Modules  are  a  collection  of  blocks,  other  modules 
(submodules),  and  parallel  data-path  modules  [Ref.  7:p.  1.21. 

Modules  are  intermediate  objects  in  the  hierarchy. 
Modules  are  a  collection  of  blocks  and  other  modules 
(submodules)  [Ref.  7:p.  1.2]. 

Chips  are  complete  integrated  circuits.  Chips 
contain  blocks,  modules,  pad  specifications,  interconnection 
lists,  positioning  information  for  blocks  and  modules  and 
packaging  information.  Chipsets  are  a  collection  of  chips. 
Chipsets  include  chip  interconnection  lists,  and  user-supplied 
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simulation  model  programs  and  timing  analysis  models  [Ref . 

6  :  p .  G .  4]  . 

After  an  object  is  attached  from  the  SELECT_OB JEC7 
menu,  a  Header  Form  is  completed  which  specifies  function  type 
(i.e.,  RAMs,  PLAs ,  random  logic,  etc.)  and  fabline.  Next,  a 
Specification  Form  is  used  to  implement  detailed  information 
about  the  object.  The  Specification  Form  varies,  depending  on 
the  object  type  selected.  The  Specification  Form  is  the  heart 
of  the  design  process.  This  is  where  all  design 
specifications  are  designated  by  the  user.  Functional  objects 
are  attached/detached,  signals  attached/detached,  and  bus 
widths  designated.  Once  the  Specification  Form  is  completed 
to  the  user's  specifications,  the  system  generates  a  form 
check  which  identifies  any  incorrect  signal  connections  which 
car.  be  corrected  immediately.  Figure  2.3  [Ref.  6:p.  5.16? 
illustrates  the  design  description  hierarchy. 
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3 .  Netlisting 

A  net  is  a  connection  between  two  or  r.ore  objects. 
Objects  are  any  one  or  combination  of  blocks,  modules,  chips, 
or  chipsets.  A  netlist  is  a  listing  of  all  the  nets  in  a 
module  or  chip.  The  netlisting  feature  allows  the  user  to 
specify  the  interconnections  between  objects.  Object- 
netlisting  and  Net-netlisting  are  two  options  which  may  be 
selected  and  show  the  same  information  from  different  points 
of  reference.  The  Object-netlist  form  is  used  to  define 
connections  of  sub-objects  to  the  system  net.  The  Net-netlist 
form,  is  used  to  specify  signal  names  to  be  combined  into  a 
network  which  the  system  creates.  Used  in  this  context,  a 
signal  is  syr.onomous  with  a  node,  from  which  a  single  node  is 
formed  from  all  connectors  and  nets  which  are  electrically 
equivalent.  Netlisting  car.  also  be  accomplished  between  chips 
to  form  chipsets.  Figure  2.4  shows  the  netlisting  commands 
hierarchy  [P.ef.  6:p.  6.12]. 
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4 .  Floorplanning 

Floorplanning  consists  of  the  actual  placement  of 
objects  on  a  module  or  chip,  the  connection  of  pins  to  pads  cr 
pinout,  and  fusion  order  specification.  Placement  refers  to 
the  actual  geographic  locations  of  objects  in  relation  to  one 
another.  The  placement  feature  allows  the  user  to  graphically 
determine  the  placement  between  objects  to  minimize  wire 
lengths.  Genesil  provides  the  appropriate  menu  depending  on 
the  object  type.  The  pinout  option  provides  for  assignment  of 
external  signals  to  on  chip  (signals  not  local  to  an  object) 
and  off  chip  'I/O  pins  and  pads)  and  is  required  for  all 
mcdules  and  chips.  Fusion  allows  the  user  to  create  and 
modify  routing  channel  assignments  on  the  flocrplan  by  binding 
objects  together  to  form  channels  [Ref.  6:p.  7.15],  Fusion 
may  be  accomplished  automatically  and  then  changed  manually 
from  the  command  selection  form  for  more  efficient  routing. 
Figure  2.5  [Ref.  6:p.  7.25]  depicts  the  Floorplanr.ing  Command 
Hierarchy . 
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5. 


Com.pilinc 

Compile  is  a  command  option  that  allows  the  user  tc 
force  ar.  immediate  compilation  of  the  complete  set  of  views  or 
selected  views  of  the  selected  object.  A  view  is  one  of  three 
Genesil  System  representations  of  a  block,  which  are 
geometric,  functional,  and  timing.  Compilation  must  be 
completed  prior  to  simulation,  timing  analysis,  plot,  or 
tooling  activities.  It  is  automatically  done  if  simulation  or 
timing  analysis  is  attempted  prior  to  compile  selection 
because  the  system  checks  the  objects  files  for  compilation 
currency.  For  efficiency,  it  was  found  that  each  block  of 
objects  should  be  compiled  at  the  completion  of  design  and 
netlisting.  Figure  2.6  fP.ef.  B:p.  A. 7]  shows  the  compile  menu 
of  commands . 

COMPILE  Menu 

I 

SIM_MODEL 

TA_MODEL 

LOAD_MODEL 

GATE_MODEL 

LAYOUT 

BUILD_ALL 

CHECK 

AUTO_DEF_SPEC 

I 

INTERACTIVE 

ABORT_GO_ON 

Figure  2.6  Compile  Menu 
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During  initial  design  phase,  the  most  significant 
compile  commands  include  Build_All,  Sim_Model,  and  ?a_Modfcl. 
The  Build_All  commands  compile  all  views  of  the  current 
object.  Sim-Model  compiles  the  simulation  model  needed  for 
simulation  of  the  current  model.  Ta-Model  compiles  the  timing 
model  necessary  for  timing  analysis  of  the  current  object. 

6 .  Simulation 

The  purpose  of  simulation  is  to  verify  that  the 
design  and  layout  generated  by  the  current  object  are 
logically  correct.  The  layout  is  synonomous  with  the 
geometric  view  of  an  object  and  is  the  physical  equivalent  cf 
that  object.  The  two  most  significant  modes  are  functional 
and  switch-level  simulation. 

Functional  simulation  generates  a  gate-level  model 
for  general  purpose  simulation  and  is  independent  of 
technology  and  layout.  It  is  dependent  only  on  circuit 
functionality  and  input  signal  changes.  To  perform  functional 
simulation,  the  block  must  be  defined  and  netlisted. 

Functional  simulation  is  based  on  a  demand-evaluaticr. 
algorithm  which  creates  a  functional  model  that  simulates  only 
the  minimum  logic  required  for  correct  results 
[Ref.  9:p.  2.1].  The  user  designates  net  values  and  manually 
advances  time  across  a  clock  edge. 

Following  design  functionality  verification,  the 
object  is  f loorplanned ,  resulting  in  the  compilation  of  a 
layout.  From  the  layout,  switch-level  simulation  is  used  tc 
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verify  functionality  of  the  actual  layout  of  the  chip. 
Switch-level  simulation  uses  an  event-driver,  algorithm  which 
is  5-10  times  slower  than  functional  simulation  and  is  best 
used  during  final  verification  only.  The  algorithm,  is  much 
slower  because  it  depends  on  detailed  signal  propagation  data 
extracted  from  timing  analysis  [Ref.  9:p.  2.2]. 

7 .  Timing  Analysis 

Timing  command  selection  places  the  user  in  the 

general  Timing  Analyser  (TA) .  The  system  then  uses  an 

algorithm  that  requires  no  test  vectors  and  generates  the 

following  timing  specs  [Ref.  10:p.  1.1]: 

Object  propagation  delays 
Paths  limiting  clock  frequency 
Duty-cycle  constraints 
Input  setup  and  hold  times 
Output  delays 

Internal  node  setup,  hold  times,  and  signal  delays 
Path  delays  between  internal  nodes. 

The  user  then  selects  a  menu  command  to  generate  a  tiring 

report  for  the  desired  information  listed  above.  The  TA  car. 

be  used  at  any  time  during  design  following  Definition 

Specification  for  random  logic  objects  or  blocks,  ar.d 

following  Definition  Specification  and  Floorplanning  for 

modules  or  chips.  Figure  2.7  [Ref.  10:p.l.l]  shows  the  Timing 

Analysis  environment. 


8 .  Manufacture  Interface 

After  a  chip  is  completely  specified  and  verified  ir. 
terms  of  functionality,  timing,  power  dissipation,  and  size, 
the  design  is  ready  to  be  sent  to  a  foundry  specified  on  the 
chip's  definition  header  form.  A  foundry,  or  factory  which 
produces  the  chip,  may  be  changed  at  any  time  by  changing  the 
selection  on  the  header  form  and  re-compiling  the  chip. 

Design  specifications  are  altered  with  each  foundry  change. 

The  fabline,  or  process  used  to  make  the  chip,  changes  with 
each  foundry  change  which  affects  chip  size,  power,  and  tiring 
results.  Chances  occur  because  chip  layout  differs  due  to  the 
selected  fabline' s  design  rule  check,  and  each  foundry  has  its 
own  models  for  devices  built  on  that  particular  fabline  [Ref. 

6 : p .  11.2] . 
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Ill . 


ADDER  CIRCUITS 


A.  PIPLINED  CIRCUITS  FOR  HIGH  PERFORMANCE 

The  purpose  of  pipelined  circuits  is  to  increase  the 
through-put  or  performance  of  a  circuit  by  splitting  the  task 
to  be  performed  into  cascaded  sub-functions  or  smaller  pieces 
and  allocating  separate  hardware  to  each  piece.  Each  piece  or 
sub-function  is  defined  as  a  stage.  A  stage  normally  consists 
of  two  components.  They  are  the  combinational  logic  to 
perform  the  sub-f unctior. .  and  a  latch  or  flip-flop  to  save  the 
output  cf  one  stage  as  input  to  the  next.  The  concept  os 
analagous  with  a  physical  pipeline  or  automobile  assemhlylir.e  . 
Data  flews  through  the  stages  of  the  circuit  at  a  rate  which 
is  independent  cf  the  length  of  the  pipeline  or  number  of 
stages.  If  the  overall  function  is  completed  in  "X"  nano¬ 
seconds  !r.s  '  ,  and  the  function  is  divided  into  stages,  cv 

sut-f unct ior.s .  then  the  output  of  the  original  function  car. 
theoretically  be  increased  by  "X/N"  ns.  This  results  ir.  an 
"N"  fold  increase  cf  perfcrm.ance  [Ref.  11]. 

There  are  physical  limitations  on  "N"  due  to  hardware 
technology,  the  function  being  pipelined,  clock-skew,  and 
critical  race.  References  11  and  12  address  hardware  and 
function  limitations.  Clock-skew,  due  to  circuit  lengths, 
loading,  and  driver  circuits,  makes  it  nearly  impossible  to 
guarantee  that  all  stages  of  a  pipelined  circuit  receive  the 
same  pulse  at  exactly  the  same  tire.  Critical  race  refers  to 


the  situation  where  a  datapath  through  a  logic  block  in  a 
stage  may  be  so  short  that,  if  the  latch  changes  its  output 
early,  the  change  may  reach  the  next  staging  latch  and  change 
it  during  the  same  clock  pulse  [Ref.  1].  For  this  reason, 
usually  flip-flops  are  used  between  logic  stages  or  a  two 
phase  clock  system  is  used  with  latches,  each  phase  serving 
alternate  stage  latches.  Two  stages  and  a  basic  pipeline 
clock  are  shown  in  Figure  3.1  [Ref.  12:p.  36]. 


Clock 

— n _ n_...  n 


— p  =  r  +  tv — -j 


Figure  3.1  Two  Stages  and  Pipeline  Clock 

B.  FULL  ADDER  DESIGN 
1 .  Introduction 

Full  adders  serve  a  significant  role  in  high 
performance  pipelined  signal  processing  circuits.  High 
performance  custom  signal  processing  filters  consist  of 
pipelined  adders  and  multipliers  built  from  full  adders.  The 
Genesil  Silicon  Compiler  system  library  contains  full  adders 
which  can  be  programmed,  via  a  menu,  from  1  to  16  bits  in 
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width.  The  performance  and  sice  of  a  1  bit  library  full  adder 
were  compared  with  the  performance  and  size  of  a  custom  full 
adder  built  on  the  Genesil  System.  Motivation  for  performance 
and  size  data  stemmed  from  the  research  data  presented  in 
Chapter  IV,  where  full  adders  were  used  to  build  high 
performance  pipelined  multiplier  chips. 

2 .  Genesil  Silicon  Compiler  Library  Full  Adder 

A  full  adder,  shown  in  Figure  3.2  [Ref,  13;p.  2.13, 
was  extracted  from  the  Genesil  Compiler  Library,  Volume  III, 
which  is  a  collection  of  all  system  random  logic  blocks 
available.  The  figure  illustrated  the  only  transparent 
information  available  to  the  user  concerning  a  full  adder. 


Figure  3.2  Genesil  View  of  Adder  Block 


The  figure  depicts  two  data  input  buses  (A  and  B),  and  a 
single  carry  input  (Cin)  which  are  added  together  resulting  in 
the  data  output  bus  (OUT)  and  a  carry  output  (COUT)  [Ref.  3]. 
The  manual  offered  no  internal  logic  circuit  diagrams  nor 
performance  and  size  specifications  for  the  adder.  Since  the 
adder  width  can  be  varied  from.  1  to  16  bits,  the  interest  was 
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in  whether  the  system  used  a  general  algorithm  for  adder 
construction,  possibly  resulting  in  wasted  size  and 
unnecessary  hardware,  or  if  it  "customized"  the  adder  to  use 
specif ications  . 

A  1  bit  full  adder  block  called  lib_f ulladd_blk  was 
constructed,  and  a  VLSI  layout  is  shown  in  Figure  3.3. 
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The  object  size,  calculated  by  the  system,  was  3.56  X  7.42 
mils,  resulting  in  a  total  area  of  26.15  square  mils. 

Timing  analysis  was  performed  or.  the  block  which 
resulted  in  the  Timing  Analyser  output  propagation  delays 
shown  in  Table  I. 


TABLE  I 

GENESIL  FULL  ADDER  OUTPUT  PROPAGATION  DELAYS 


OUTPUT 

OUTPUT 

! 

DELAYS  (ns)  j 

» 

MIN 

| 

MAX  i 

j 

COUT 

1 . 9 

f 

4.7  j 

i 

i 

SUN 

1.6 

_ 

1 

5.2  i 

1 

3  .  Custom.  Full  Adder 

A  typical  full  adder  was  constructed  from  exclusive 
or  gates,  AND  gates,  and  OR  gate  as  shown  in  Figure  3.4  [P.ef 
14]  . 
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An  object  called  cus_f ul 1 add_blk  was  constructed  and 
a  VLSI  layout  is  shown  in  Figure  3.5. 


Figure  3.5  Custom  1  Bit  Full  Adder  Layout 
The  object  size,  in  mils,  as  calculated  by  the  system  was  7.58 
X  2.42.  This  resulted  in  a  total  area  of  18.34  square  mils. 
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Timing  analysis  was  performed  by  the  system  tim.i 
analyzer  resulting  in  output  propagation  delays  presented 
Table  II. 


TABLE  II 

CUSTOM  FULL  ADDER  OUTPUT  PROPAGATION  DELAYS 


OUTPUT 


OUTPUT  DELAYS  (ns) 


|  MIN 

j 

MAX 

COUT 

i 

2.5 

5.0 

SUM 

; 

1 . 4  i 

4.6 

4 .  Results 

The  results  of  a  comparison  between  the  system 
library  full  adder  and  the  custom,  full  adder  are  summ.ariz 
Table  III. 


TABLE  III 

SIZE  AND  PERFORMANCE  SUMMARY 


j 


MAX  PROPAGATION 
DELAY  (ns) 

TOTAL  AREA 
(SQ  MILS) 

GENESIL 

FULL  ADDER 

5.2 

26.15 

CUSTOM 

FULL  ADDER 

5.0 

18.34 

The  data  indicate  that  the  custom  full  adder  provides  a 
0.2ns  performance  improvement  with  7.81  square  mils  savin?  in 
area . 

C.  FOUR  BIT  ADDER  DESIGN  4 

1 .  Introduction 

The  four  bit  adder  is  the  building  block  for  high 
performance  pipelined  adders.  A  basic  four  bit  adder  is  a  ,  • 

ripple  carry  circuit  consisting  of  four  full  adders.  The 
carry  out  of  each  adder  ripples  down  as  one  of  the  three 
inputs  to  the  next  adder.  Performance  size  considerations  • 

influence  the  appropriate  design.  Designs  include  pipelining 
full  adders  with  latches,  four  bit  carry-lock-ahead  adders 
(CLA) ,  and  pipelined  CLA.  A  pure  performance  pipelined  CLA  is  • 

described  in  Reference  15. 

2 •  Genesil  Silicon  Compiler  Library  Four  Bit  Adder 

A  library  four  bit  adder  called  lib_4bit_blk  was  • 


constructed  by  using  the  Random  Logic  Block  Specification  menu 
shown  in  Figure  3.6. 
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Block  type: 

ADDER 

Block  index 

:  C 

Nare : 

> ADDERO 

Width: 

>  4 

Regime 

Connector 

Width  : 

Timing 

A 

4  1 

Prop ( t  > 

r  3] 

>  A3 

[  2] 

>A2 

t  1] 

>A1_ 

C  0] 

>  A0 

B 

4  1 

Prop ( t ) 

[  3] 

>B3 _ 

[  2] 

>B2 

l  1] 

>B1__ 

t  0] 

>B0 

OUT 

4  1 

Prop(t) 

C  3] 

>S3__ 

C  2] 

>S2 

[  13 

>S1_ 

[  0] 

>so 

cir 

1  1 

Prop  ( t ) 

[  0] 

>Ci_ 

COUT 

1  1 

Prop ( t ) 

[  03 

>Co 

FEET 

4  1 

Feed  thru 

C  33 

>  FAL: 

[  23 

>FAL: 

[  13 

>FAL 

i  0] 

>FAL: 

Figure  3.6  Genesil  4  Bit  Adder  Specif ication  Mer.u 


Signal  nares  were  specified  for  the  A  ar.d  B  buses,  the  4  bit 
sur  (OUT'  .  carry-ir.  (CIN)  .  and  carry-out  (COUT)  . 

A  VLSI  layout  of  the  adder  was  constructed  by  using 
the  system's  plot  feature  and  is  shown  in  Figure  3.0. 


'W  Ul  Ui  Ul 
IV)  V)  V)  {/) 


Figure  3.7  Genesil  Layout  1 ib_4bi t_blk 
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The  size  of  the  adder,  calculated  by  the  syster  during  layout, 
measured  8.77  X  7.42  mils,  resulting  ir.  a  total  area  cf  85.07 
square  mils. 

Timing  analysis,  provided  by  the  System.  Timing 
Analyzer,  produced  the  output  propagation  delays  for  all 
output  signals.  Data  are  provided  in  Table  IV. 

TABLE  IV 

GENESIL  4  BIT  ADDER  OUTPUT  PROPAGATION  DELAYS 


OUTPUT 

i 

OUTPUT  DELAYS  (ns'  1 

_ _ _ ! 

MIN 

MAX  | 

i 

Co  i  1.9 

1 

9.5  ! 

i 

i  SO  I  1.6 

1  1 

5.2  j 

*  si  i  3.1 

1  1  .  _ 

6.3  j 

i 

1  S  2  1  2.3 

i 

8.0  i 

' 

i  £2  1  2  _ 1 

9.2  1 

Maxi 

mum 

propaga 

tier,  delay  was  9.5  ns.  occurring 

at  the  carry 

out 

(Co' 

output 

• 

3  . 

Custom 

Four  Bit  Carry-Look-Ahead  (CLA) 

Adder 

The  Genesil  System,  manuals  provide  no  information 


concerning  any  CLA  features  of  the  library  adder.  Without 
CLA ,  addition  can  become  inefficient,  but  by  pipelining 
conventional  adder  circuitry,  performance  can  be  increased  at 
the  price  of  additional  hardware.  CLA  circuits  involve  mere 
hardware  than  ripple  carry  circuits,  but  are  faster. 


a  = 


The  principle  of  C LA  circuits  involves  anticipating  when 
and  where  a  carry  will  be  generated.  The  circuit  "looks 
ahead"  to  see  where  the  carry  is  needed.  References  15  and  1£ 
provide  detailed  CLA  algorithm  development  and  analysis.  The 
circuit  shown  in  Figure  3.8  [Ref.  17]  was  constructed  for  a 
performance  and  size  comparison  with  the  Genesil  4  bit  adder. 

Figure  3.9  shows  the  Random  Logic  Functional 
Specification  menu  used  to  build  the  CLA  circuit.  Each  random 
logic  object  was  specified,  signals  designated,  and  the 
circuit  net-listed. 


3  6- 


Figure  3.8  4  Bit  CLA  Adder 
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Ficure  3.9  Randor.  Logic  Functional  Specification  Menu 
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An  object  layout  of  the  CLA  block  called 
cus_cla4bi t_blk  was  completed.  The  system  calculated  the  size 
of  the  object  as  44.53  X  2.42  mils,  resulting  in  a  total  area 
of  107.76  square  mils. 

Since  this  was  a  custom  object,  the  System  Functional 
Simulator  was  used  to  test  for  correct  logic.  Random  values 
were  put  on  the  inputs,  the  system  clocks  cycled,  and  results 
read  from  the  Functional  Simulator  menu.  An  example 
simulation  run  extracted  from  the  Functional  Simulator  is 
included  as  Figure  3.10. 


>  /cus_cla4bit_blk  is  of  type  genblock/rl  with  28  ports 


> 

pert 

0  I  AO 

to 

NC  = 

1 

> 

pert 

1  T  £  1 

to 

NC  = 

1 

> 

port 

2  I  E0 

to 

NC  = 

1 

> 
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3  I  A2 

to 

NC  = 

1 
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4  I  El 
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NC  = 
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8  0  C4 
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NC  = 
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9  0  SC 
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NC  = 
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1  r  /-\  (ji 

to 

NC  = 
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0  S  2 

to 

NC  = 

1 

X 
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2  o  S3 

to 

NC  = 

1 
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port 

13  I  Ci 

to 

NC  = 

1 

Figure  3.10  4  Bit  CLA  Simulation 

The  System  Timing  Analyser  was  used  for  output 
propagation  delays  data  for  all  output  signals.  The  results 
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are  shown  in  Table  V. 


The  data  indicate  that  the  custom  CLA  provides  a  0.5  ns 
performance  improvement,  but  is  42.69  square  mils  larger  than 
the  library  adder.  The  performance  of  the  library  adder 
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indicates  that  the  circuit  probably  has  CLA  circuitry  included 
in  the  system  adder  algorithm.. 

D.  SIXTEEN  BIT  ADDER  DESIGN 

1 .  Introduction 

The  pipelined  16  bit  adder  can  be  used  in  conjunction 
with  two's  complement  hardware  for  special  purpose  signal 
processors,  in  the  final  stages  of  a  16  bit  multiplier  as 
presented  in  Chapter  IV,  or  in  various  other  capacities 
involved  with  high  performance  special  purpose  hardware.  This 
section  compares  the  performance  and  size  of  a  16  bit  Genes; 1 
library  adder  with  a  custom  16  bit  adder  built  or.  the  Ger.esil 
System.  Additional  pipelined  adder  designs  and  performance 
data  are  available  in  Reference  18. 

2 .  Genesil  Library  16  Bit  Adder 

A  library  16  bit  adder  called  lib_16bit_blk  was 
constructed  by  using  the  Random  Logic  Block  Specification  menu 
shown  in  Figure  3.11. 

Block  type:  ADDER 

Block  index:  0 

Name:  > ADDERO 

Width:  >16 

Regime 

Connector  Width  Timing 


A 

16 

1  Prop(t) 

> A [15 : 0] 
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16 

1  Prop(t) 

>B [15:0] 

OUT 

16 

1  Prop(t) 

>  S  [  15  :  0] 

CIN 

1 

1  Prop(t) 

>Ci 

COUT 

1 

1  Prop(t) 

>Co 

FEED 

16 

1  Feedthru 

>FALSE* 16 

Figure  3.11  Genesil  16  Bit  Adder  Specification  Menu 
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Input  signal  names,  in  bus  notation,  were  specified  for  the  A 
(A[15:0J)  and  B  (B  [15:03 )  buses,  and  Cin.  Output  signals 
included  sum.  (s[15:0]),  in  bus  notation,  and  Co. 

A  VLSI  layout  from  the  system  plot  feature  is  included 
as  Figure  3.12. 
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The  size  of  the  library  16  bit  full  adder  was  calculated  by 

the  system  to  be  29.56  X  7.42  mils,  resulting  in  a  total  area 
of  219.33  square  mils. 

Output  propagation  delays  for  all  signals  were 
calculated  by  the  System  Timing  Analyser  and  are  presented  in 
Table  VII. 
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TABLE  VII 

GENESIL  16  BIT  ADDER  OUTPUT  PROPAGATION  DELAYS 


OUTPUT  DELAYS  (ns!  ! 


Maximum  propagation  delay  was  27.2  ns,  occurring  at  the  carry 
out  (Co)  signal. 
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Custom  16  Bit  Pipelined  Adder 

A  custom.  16  bit  adder  was  constructed  using  Genesil 
library  4  bit  adders  for  the  add  logic,  and  2-phase  library  D 
flip/flops  to  retain  the  data  between  each  stage.  The  library 
adders  were  used  because  the  performance  and  size  comparison 
with  a  4  bit  CLA  previously  completed  indicated  that  the 
differences  were  insignificant  for  the  purposes  of  this 
section.  An  attempt  to  build  a  custom  Earle  latch,  as 
presented  in  Reference  12,  failed  because  the  system 
disallowed  random  logic  gate  output  signals  to  simultaneously 
perform,  as  both  external  output  signals  and  internal  feedback 
signals.  This  feature  is  required  for  memory  in  both  the  D 
flip/flop  and  Earle  latch.  This  problem  was  not  pursued 
further  because  a  library  D  flip/flop  setup  and  hold  time  was 
found  not  to  exceed  6.5  ns.  This  was  approximately  3.0  ns 
less  than  the  4  bit  library  adder  which  was  used  in  ~ach  stage 
of  the  adder.  The  adder  logic,  therefore,  was  the  dominating 
delay  factor  driving  the  clock  speed.  Figure  3.13  shows  the 
design,  in  block  diagram  form,  used  to  construct  the  adder  on 
Genesil . 

Figure  3.14  shows  a  VLSI  layout  of  the  custom  16  bit 
pipelined  adder  module,  without  input/output  pads,  constructed 
for  performance  and  size  comparison  with  the  library  adder. 

The  module  was  floorplanned  using  the  list-best  command  in  the 
floorplan  menu.  This  feature  graphically  advised  the  user  of 
the  best  placement  of  the  5  stages  in  the  module  for  optimum 


routing  and  fusion.  The  adder  module  was  21.89  X  339.19  mils, 
resulting  in  a  total  area  of  7424.87  square  mils. 


4’7 


Fiaure  3.15  Manually  Floorplanned 
16  Bit  Pipelined  Adder 
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Although  this  design  was  significantly  smaller  than  the  one 
floorplanned  by  the  system,  routing  and  fusion  of  the  module 
floorplanned  by  the  system,  took  a  significantly  shorter  time 
to  complete.  The  system  floorplanned  module  took  10  to  15 
minutes  to  route,  while  the  manually  floorplanned  module  took 
1  to  2  hours  to  complete  the  routing. 

Figure  3.16  shows  the  pipelined  adder  with  the  pads 
attached.  This  figure  was  included  to  demonstrate  the 
significant  increase  in  chip  size  experienced  with  the 
addition  of  pads  and  associated  routing.  The  size  of  the  chip 
increased  to  171.83  X  385.83  mils,  resulting  in  a  total  area 
of  66,297.17  sguare  mils. 
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Sinulation  was  performed  for  logic  validation  using 
the  Functional  Simulator.  Various  combinations  of  binary 
integers  were  placed  on  the  input  signal  buses,  the  system 
clocks  cycled,  and  test  results  were  observed  on  the  output 
signal  buses.  Figure  3.17  shows  a  sample  of  a  simulation  test 
run  extracted  from  the  Functional  Simulation  output  form. 


>  is  of  type  module  with  48  ports 


> 

port 

1 

I 

TRUE  to  NC  =  H 

> 

port 

3 

I 

FALSE  to  NC  =  L 

> 

port 

5 

0 

al [15: 0]  to  NC‘16  =  1111111111111110 

> 

port 

7 

I 

a [15:0]  to  NC‘16  =  HHHHHHHHHHHHHHHH 

> 

port 

9 

o  : 

bl [15:4]  to  NC‘12  =  111111111111 

> 

port 

11 

i 

b [1 5 : 0]  to  NC*16  =  HHHHHHHHHHHHHHHH 

> 

port 

13 

0 

clo  to  NC  =  1 

> 

port 

15 

CI  phase_a  to  NC  =  1 

> 

port 

17 

Cl  phase_b  to  NC  =  0 

> 

port 

20 

0 

a2 [ 15 : 8 ]  to  NC*8  *  11111111 

> 

port 

21 

0 

a2  [ 3  :  0 ]  to  NC*4  =  1111 

> 

port 

23 

0 

b2  [  1 8 : 8 ]  to  NC*8  =  11111111 

> 

port 

25 

0 

c2o  to  NC  =  1 

> 

port 

27 

0 

dO [3 : 0]  to  NC*4  =  1110 

> 

port 

30 

0 

a3 [15 : 12]  to  NC‘4  =  1111 

> 

port 

31 

0 

a3 [3 : 0]  to  NC*4  =  1111 

> 

port 

33 

0 

b3 [15:12]  to  NC‘4  =  1111 

> 

port 

35 

0 

c3o  to  NC  =  1 

> 

pert 

37 

0 

d3 [7 : 0]  to  NC* 8  =  11111110 

> 

port 

39 

0 

a4 [ 3 : 0]  to  NC*4  =  1111 

> 

pert 

41 

0 

c4  to  NC  =  1 

> 

port 

43 

0 

d4 [11:0]  to  NC*12  =  111111111110 

> 

port 

45 

0 

carry  to  NC  =  1 

> 

port 

47 

0 

s urn [15:0]  to  NC*16  =  1111111111111110 

Figure  3.17  Simulation  Results  of  16  Bit  Pipelined  Adder 


Timing  analysis  was  performed  to  investigate  the 
worst  case  delay  in  each  stage  of  the  adder.  Figure  3.18 
through  Figure  3.22  list  the  worst  case  propagation  delays  and 
identifies  the  worst  case  signal  for  each  of  the  five  stages 
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of  the  pipelined  adder.  The  data  indicate  that  the  largest 
propagation  delay  was  9.5  ns  which  occurred  in  stages  1 
through  4.  This  delay  is  attributed  to  the  library  4  bit 
adder  used  in  each  of  these  stages. 
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OUTPUT  DELAYS 

(ns ) 

Output 

Phl(r)  Delay 

Ph2 (r) 

Delay 

Min 

Max 

Min 

Max 

al[0] 

3.5 

4.4 

3.5 

4.4 

altlO] 

4.3 

4.5 

— 

— 

alEll] 

4.3 

4.5 

— 

— 

al [12] 

4.3 

4.5 

— 

— 

al [13] 

4.3 

4.5 

— 

— 

al [14] 

4.3 

4.5 

— 

— 

al [15] 

4.3 

4.5 

— 

— 

al  [1] 

3.1 

6.3 

3.1 

6.3 

al  [2] 

2.3 

8.0 

2.3 

8.0 

al  [3] 

3 . 1 

9.2 

3.1 

9.2 

al  [4] 

3 . 8 

4.0 

— 

— 

al  [5] 

3.8 

4.0 

— 

— 

al  [61 

3.8 

4.0 

— 

— 

al  [7] 

3.8 

4.0 

— 

— 

al  [8] 

4.3 

4.5 

— 

— 

al  [9] 

4.3 

4.5 

— 

— 

bl  [1C] 

4.3 

4.5 

— 

— 

bl  [11] 

4 . 3 

4.5 

— 

— 

bl  [12] 

4.3 

4.5 

— 

— 

bl  [13] 

4.3 

4.5 

— 

— 

bl  [14] 

4.3 

4.5 

— 

— 

bl  [15] 

4.3 

4.5 

— 

— 

bl  [4] 

3.8 

4.0 

— 

— 

bl  [5] 

3 . 8 

4.0 

— 

— 

bl  [5] 

3.0 

4.0 

— 

— 

bl  [6] 

3.8 

4 . 0 

— 

— 

bl  [7] 

3 . 8 

4.0 

— 

— 

bl  [8] 

4 . 3 

4 . 5 

— 

— 

bl  [9] 

4  .  3 

4 . 5 

— 

— 

clo 

1 . 9 

9.5 

1.9 

9.5 

Figure 

3.18 

Stage_l  Output  Delays 
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OUTPUT  DELAYS  (ns) 


Output 

Phi (r) 

Delay 

Ph2 (r) 

Delay 

Min 

Max 

Min 

Max 

a2  [0] 

1.6 

5.2 

1.6 

5.2 

a2  [10] 

4.3 

4.5 

— 

— 

a2  [11] 

4.3 

4.5 

— 

— 

a2 [11] 

4.3 

4.5 

— 

— 

a2 [12] 

4.3 

4 . 5 

— 

— 

a2 [13] 

4.3 

4.5 

— 

— 

a2  [14] 

4.3 

4.5 

— 

— 

a2  [15] 

4.3 

4.5 

— 

— 

a2[11 

3.1 

6.3 

3.1 

6.3 

a2[2] 

2.3 

8.0 

2.3 

8 . 0 

a2  [3] 

3.1 

9.2 

3.1 

9.2 

a2  [8] 

4.3 

4.5 

— 

— 

a2[9] 

4 . 3 

4.5 

— 

— 

b2  [10] 

4.3 

4.5 

— 

— 

b2  [11] 

4.3 

4 . 5 

— 

— 

b2  [12] 

4 . 3 

4.5 

— 

— 

b2  [13] 

4.3 

4.5 

— 

— 

b2  [14] 

4.3 

4 . 5 

— 

— 

b2  [15] 

4 . 3 

4.5 

— 

— 

b2  [8] 

4.3 

4.5 

— 

— 

b2  [9] 

4.3 

4 . 5 

— 

— 

c2c 

1.9 

9.5 

1.9 

9.5 

dO  [01 

3.5 

3.7 

— 

— 

d0  [1] 

3.5 

3.7 

— 

— 

dO  [2] 

3.5 

3.7 

— 

— 

d0[3] 

3.5 

3.7 

— 

— 

Figure  3.19  Stage_2  Output  Delays 
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OUTPUT  DELAYS  (ns) 


Output 

Phi (r ) 

Delay 

Ph2 (r ) 

Min 

Max 

Min 

a3[0] 

1.6 

5.2 

1.6 

a3  [12] 

3.8 

4.0 

— 

a3[13] 

3.8 

4 . 0 

— 

a3  [14] 

3.8 

4.0 

— 

a3[15] 

3.8 

4.0 

— 

a3  [1] 

3.1 

6.3 

3.1 

a3[2] 

2.3 

8.0 

2.3 

a3[3] 

3.1 

9.2 

3.1 

b3  [12] 

3.8 

4.0 

— 

b3  [13] 

3.8 

4.0 

— 

b3  [14] 

3.8 

4.0 

— 

b3  [15] 

3.8 

4.0 

— 

c3o 

1.9 

9.5 

1.9 

d3  [0] 

3.8 

4.0 

— 

<33[1] 

3.8 

4.0 

— 

<13  [2] 

3.8 

4.0 

— 

<13  [3] 

3.8 

4.0 

— 

<33  [4] 

3.8 

4.0 

— 

<33  [ 5] 

3.8 

4.0 

— 

<33  [6] 

3.8 

4.0 

— 

<33  [ 7  ] 

3.8 

4.0 

— 

Figure  3.20  Stage_3  Output  Delays 


Delay 
Max 
5 . 2 


6.3 
8  . 
9. 


9.5 
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o  eg 


OUTPUT 

DELAYS  (ns) 

Output 

Phl(r)  Delay 

Ph2 (r) 

Delay 

Min 

Max 

Min 

Max 

a  4  [0] 

1.6 

5.2 

1.6 

5.2 

a  4  [1] 

3.1 

6 . 3 

3.1 

6 . 3 

a4  [2] 

2.3 

8.0 

2.3 

8.0 

a4  [3] 

3.1 

9.2 

3.1 

9.2 

c4 

1.9 

9.5 

1.9 

9 . 5 

d4  [0] 

4.0 

4.2 

— 

— 

d4  [10] 

4.0 

4.2 

— 

— 

d4  [11] 

4.0 

4.2 

— 

d4[l] 

4 . 0 

4.2 

— 

— 

a4  [1 3 

4.0 

4.2 

— 

— 

<54  [21 

4.0 

4.2 

— 

— 

d4  [3] 

4.0 

4.2 

— 

— 

d4  [4] 

4.0 

4.2 

— 

— 

<54  [53 

4.0 

4.2 

— 

— 

d4  [6] 

4.0 

4 . 2 

— 

— 

d4  [7] 

4.0 

4.2 

— 

— 

d4  [8] 

4.0 

4.2 

— 

— 

d4  [9] 

4 . 0 

4.2 

Figure 

3.21  Stage. 

_4  Output 

Delays 
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OUTPUT  DELAYS 

(ns ) 

Output 

Phi (r) 

Delay 

Min 

Max 

carry 

3.3 

3.5 

sum [0] 

4 . 3 

4.5 

sum [10] 

4.3 

4 . 5 

sum[ll] 

4.3 

4.5 

sum [12] 

4.3 

4.5 

sum [13] 

4.3 

4.5 

sum [14] 

4.3 

4.5 

sum [15] 

4.3 

4.5 

sum[l] 

4.3 

4.5 

sum [2] 

4.3 

4.5 

sum [3] 

4.3 

4.5 

sum [4] 

4.3 

4.5 

sum [5] 

4.3 

4.5 

sum [6] 

4 . 3 

4.5 

sum [7] 

4 . 3 

4.5 

s  urn  [  8  ] 

4.3 

4 . 5 

sum  [9] 

4 . 3 

4.5 

Figure  3.22  Stage_5  ' 

Ph2 (r! 
Min 


Delay 

Max 


4 .  Results 

The  results  of  the  comparison  between  a  standard  16 
bit  library  adder  and  custom,  pipelined  16  bit  adder  are  shown 
in  Table  VIII.  These  figures  do  not  include  delay  associated 
with  interstage  flip-flops. 


TABLE  VIII 

SIZE  AND  PERFORMANCE  SUMMARY 


TOTAL  AREA 
(Sq  mils) 

MAXIMUM  PROPAGATION 
Delay  (ns) 

GENESIL 

16  BIT  ADDER 

219 . 33 

27.2 

CUSTOM  16  BIT 
PIPELINED  ADDER 

7424.87 

9.5 
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PERFORMANCE  SUMMARY 


Table  IX  is  a  summary  of  the  size  and  perforrar.ee  results 
of  the  adders  designed  and  constructed  in  this  chapter. 


TABLE  IX 

ADDERS  SIZE  AND  PERFORMANCE  SUMMARY 


TOTAL  AREA 
(Sq  mils) 

MAXIMUM  PROPAGATION 
Delay  (ns) 

GENESIL 

FULL  ADDER 

26.15 

5.2 

.  i 

CUSTOM 

FULL  ADDER 

18 . 34 

1 

5.0  1 

i 

GENESIL 

4  BIT  ADDER 

65.07 

I 

9.4  j 

i 

CUSTOM  4  BIT 
CLA  ADDER 

107.76 

i 

9.0  1 

j 

GENESIL 

16  BIT  ADDER 

219.33 

! 

27.2 

CUSTOM  16  EIT 
PIPELINED  ADDER 

7424 . 87 

9.5 

_ 

The  data  indicate  that  there  is  no  significant  perforrar.ee 
advantage  gained  with  the  custor.  full  and  4  bit  adders.  The 
custom  16  bit  pipelined  adder  data  clearly  illustrate  the 
performance  advantage  gained  by  pipelining. 


IV. 


MULTIPL I EP.  CIRCUIT S 


A.  INTRODUCTION. 

The  binary  multiplier  is  a  major  component  of  signal 
processor  filters.  Conventional  ALU  add  and  shift  multiply 
functions  and  parallel  multiplier  circuits  are  not  adequate 
for  the  speed  requirements  of  the  high  speed  processor.  This 
chapter  presents  performance  data  on  4 ,  8,  and  16  bit  unsigned 
library  multipliers,  which  are  compared  to  a  custom  4  and  16 
bit  pipelined  multiplier. 

The  multiplication  add-and-shif t  algorithm  for  two  r.-bit 
binary  numbers  are  represented  by  Equation  4.1  [Refs.  11  and 
161  . 


p  =  2  2  a  b  f  4  .  1 ' 

k  =  C  k 


b  represents  the  n-bit  multiplication  vector,  a 
represents  bits  r.  of  the  multiplier  vector  a  and  p 
represents  the  2n  bit  product  vector. [Ref.  16:p.  II1 

This  concept  is  illustrated  in  Figure  4.1  [Ref.  16:p.  11]  for 
the  product  of  two  8-bit  integers.  The  multiplication  of  the 
two  8-bit  integers  results  in  eight  partial  products, 
generated  from,  the  ANDING  of  the  multiplicand  (MC)  and 
multiplier  (MP)  bits,  which  are  then  added  to  form  the  final 
product . 

62 


J 


X7 

X6 

X5 

X4 

X3 

X2 

XI 

XO 

— MULTIPLICAND 

Y7 

Y8 

Y5 

Y4 

Y3 

Y2 

Y1 

YO 

_ pp-r  t5r  Tjr  p 

A7 

AO 

A  5 

~A4_ 

~A T 

A  2 

A1 

~A0 

_ £  PAP.TTA7 

B7 

B6 

B  5 

B4 

B3 

B2 

Bl 

EO 

PRODUCT 

C7 

C  6 

C  5 

C  4 

C  3 

C2 

Cl 

CO 

D7 

D6 

D5 

D4 

D2 

D2 

D1 

DO 

E7 

E€ 

E5 

E4 

E3 

E2 

El 

EO 

F7 

F£ 

F5 

F  4 

F3 

F2 

F2 

FC 

G7  C-G 

G  - 

G4 

G  3 

G  2 

01 

GO 

H7  h€  H5 

H4 

H3 

H2 

HI 

HO 

S14S1321 

2S1 

!.  ?  1 

OS  9 

S3 

C  7 

S  € 

S  5 

S  4 

c  3 

c  2 

SI 

SO 

_ FINAL 

PRODUCT 

r 

cur 

e  4 

.  1 

Two  8 

-Bi 

t  Integer 

Produ 

B.  4,8,  AND  16  BIT  GENESIL  LIBRARY  MULTIPLIER 
1 .  Mu ltiplier  B 1  o  ck_  Array  Cor e 

The  Genesil  library  r.ultiplier  block  array  cere  is  an 
array  of  half  and  full  adders  which  provides  a  parallel 
multiplier  scheme  for  integer  and  fraction/mantissa  portions 
of  floating-point  numbers.  In  depth  parallel  multiplier 
theory  and  schematics  are  contained  in  Reference  4:pp.  344- 
348.  The  multiplier  block  array  core  and  operation 
illustrating  the  product  of  111001  multiplied  by  1101  is  shown 
in  Figure  4.2  [Ref.  13:p.  4.3].  The  block  is  designed  fer 
unsigned  integer  multiplication  and  requires  external 
circuitry  for  signed  operations.  The  least  significant  (LS' 
bits  are  produced  directly  from  the  array,  but  an  external 
adder  is  required  to  complete  the  partial  product  addition  of 
the  most  significant  (MS)  bits.  The  multiplier  and 
multiplicand  widths  car.  be  varied  from.  4  to  32  bits,  but  the 
multiplier  width  cannot  exceed  the  multiplicand  width. 
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2. 


Three  modules,  containing  4,  8,  and  16  bit  Genesil 
library  multipliers  and  external  adders,  were  constructed  from 
the  Specification  Menu.  Each  was  simulated  for  correct  logic 
and  processed  by  the  Timing  Analyser  for  propagation  delay 
data.  The  results  are  summarized  in  Table  X. 


TABLE  X 

4,  8,  AND  16  BIT  OUTPUT  PROPAGATION  DELAYS 


1 

_ _ 

OUTPUT  PROPAGATION  i 
DELAYS  (ns;  1 

_ I 

— 

MIN 

i  MAX 

1 

!  4 

EIT 

-| - — - 

— i 

MS  SUM 

3 . 2 

I  10.9 

i 

I 

LS  OUT 

3.1 

i  10.3 

i 

i 

MS_OUT 

5.3 

1  18.8 
| 

l 

J 

i  8 

EIT 

1 

1 

i 

i 

MS  SUM 

4 . 1 

1  24.4 

i 

i 

L  ?  OUT 

3.2 

1  23.4 

i 

1 

1 

M  S  OUT 

5  .  o 

'  38.2 

i 

i 

1  16 

EIT 

i 

j 

1 

MS  SUM 

4.6 

|  51.0 

I 

1 

LS  OUT 

J  . 

|  49.4 

i 

1 

MS  OUT 

6.9 

!  77.0 

! 

The  MS_SUM  data  are  the  propagation  delays  of  the  array  core 
only.  MS_OUT  is  the  total  propagation  delay  of  both  the  core 
and  external  adder. 

The  addition  of  D  flip/flops  between  the  core  and 
external  adder  in  Figure  4.2  decreased  the  propagation  delay 
driving  the  maximum,  clock  speed  allowable  for  the  circuit  to 
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that  of  the  MS_SUM  output  propagation  delay.  The  D  F/F  inputs 
were  M?_SUM,  MS_CAP.P.IES ,  and  LS_OUT.  MS_STJM  and  MS_CARP.IES 
were  then  clocked  into  the  external  adder  or  next  stage  cf  the 
pipeline.  Table  XI  illustrates  the  theoretical  allowable 
clock  speed  of  a  circuit  using  the  multiplier  modules 
considering  each  module  with  and  without  the  D  flip/flop 
insertion.  The  modules  without  D  flip/flops  inserted  between 
the  core  and  external  adder,  clock  speeds  were  calculated 
assuming  there  was  a  D  flip/flop  attached  to  the  outputs  of 
the  external  adder  and  LS  OUT. 


TABLE  XI 

THEORETICAL  CLOCK  SPEED  OF 
4,  8,  AND  16  BIT  LIBRARY  MULTIPLIER 


'CLOCK  SPEED  (MHZ) ! 

1 

'WITHOUT 

WITH  ' 

1  D  F/F 

D  F/F  1 

'  INSERTED 
i 

INSERTED ! 

i 

4 

BIT 

i  39.5 

i 

57.4  1 

i 

8 

BIT 

MULTIPLIER 

!  22.3 

i 

32.3  | 

i 

116 

BIT 

MULTIPLIER 

i  i  i  3 

j _ 111 _ 

17.3  i 
_ 1 

The  data  indicate  that  there  was  a  significant  increase  of  the 
allowable  clock  speed  of  a  circuit  using  the  multiplier 
modules  with  the  addition  of  the  D  flip/flop  inserted  between 
the  core  and  external  adder. 
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c. 


4  BIT  PIPELINED  MULTIPLIER 


r 

> 

E' 

r 


■ 


P 


P 


1 .  Introduction 

This  section  presents  performance  data  for  a  4  bit 
pipelined  multiplier  using  the  Wallace  Tree  structure.  Figure 
4.3  illustrates  a  4X4  multiplication  in  dot  form  [Ref.  16]. 


Figure  4.3  4X4  Multiply 

After  the  partial  products  are  formed,  the  three  right  columns 
of  partial  products  are  shifted  down  to  form,  a  pyramid  cr 
tree,  as  illustrated  in  Figure  4.5  [Ref.  16]. 


Ficure  4.4 


Wallace  Tree  Partial  Products 
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Next,  3-input,  2-output  full  adders  are  used  to  compute  carry 

save  addition  (CSA)  for  colurr.n  reduction. 

To  reduce  these  columns  of  height  h.  CSA  is  used  to  reduce 
three  dots  of  colurr.n  height  to  two.  These  two  output  dots, 
which  represent  the  familiar  suit,  and  carry  outputs  of  a  full 
adder,  are  placed  in  the  next  level  of  the  tree  structure  in 
their  appropriate  positions . [Ref .  16:p.l6] 

This  concept  is  illustrated  in  Figure  4.5  for  a  4X4 

multiplication . 


FIRST  LEVEL  CSA 

SECOND  LEVEL  CSA 

LAST  LEVEL  CSA 

Figure  4.5  CSA  Reduction  4X4  Multiplication 

Once  reduced  to  the  last  level  addition,  various  CLA  and 
pipelined  ripple  adder  designs  are  available  for  increased 
performance . 
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2. 


4  Bit  Pipelined  Multiplier  Design 

The  4  bit  pipelined  multiplier  was  designed  using  the 
Wallace  Tree  Structure,  with  D  flip/flops  inserted  for 
pipelining.  A  block  diagram  of  the  module  is  shown  in  Figure 
4.6.  All  partial  products  were  generated  simultaneously  by 
the  16  AND  gates.  Next,  the  partial  products  were  reduced 
using  the  Wallace  Tree  concept  described  in  section  1.  The 
final  level  additions  were  computed  by  a  pipelined  Genesil 
library  2  bit  ripple  adder  and  a  3  bit  ripple  adder. 
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LEVEL  ONE  (W  1) 


The  module  consisted  of  4  blocks  called  W_1 ,  W_2  , 

W_3  ,  and  VJ_4  .  The  module  was  attached  to  a  chip, 
f loorplanned ,  and  simulated  to  test  for  correct  logic.  Figure 
4.7  depicts  the  floorplan. 

Figure  4.8  shows  the  4  Bit  Multiplier  Chip  with  pads, 
clock,  ground,  and  power.  Timing  analysis  was  performed  by 
the  system  Timing  Analyser  for  output  propagation  delays  at 
each  level.  The  results  are  presented  in  Table  XII. 

TABLE  XII 

4  BIT  PIPELINED  MULTIPLIER  OUTPUT  PROPAGATION  DELAYS 
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D.  16  BIT  PIPELINED  MULTIPLIER 

The  purpose  of  the  work  reported  in  this  section  was  to 
build  a  strictly  high  performance  16  bit  pipelined  multiplier 
chip  which  could  be  rapidly  tested  and  de-bugged.  Two  designs 
were  considered.  They  were  the  Wallace  Tree  structure  and 
ripple  adder  design  using  a  pipelined  parallel  multiplier  with 
all  partial  products  computed  prior  to  array  entry. 

The  Wallace  Tree  structure  was  rejected  because,  while  it 
saved  only  two  levels  of  logic,  the  design  presented  serious 
de-buggir.c  difficulties.  It  was  found  to  be  extremely 
difficult  to  trace  and  debug  signal  errors  when  the  column 
height  was  16. 

The  design  used  was  the  pipelined  parallel  multiplier. 

The  primary  advantage  of  this  design  was  found  to  be  the 
relative  ease  of  de-buggir.g  the  chip.  The  primary 
disadvantage  was  the  additional  cost  in  hardware  and  chip  size 
associated  with  D  flip/flop  delays  used  to  align  and  save 
intermediate  results  [Ref.  12:pp.  51-53]. 

1 .  16  Bit  Pipelined  Multiplier  Design 

The  16  bit  pipelined  multiplier  was  designed  using  a 
pipelined  parallel  multiplier,  and  pipelined  ripple  carry 
adder  hardware  for  summing  the  final  partial  products.  A 
design  block  diagram  is  shown  in  Figure  4.9.  All  partial 
products  were  generated  simultaneously  by  the  256  AND  gates. 
This  initial  partial  product  generation  is  also  necessary  for 
the  Wallace  Tree  structure.  Partial  product  reduction  can  be 


_ l - «  1  _  » _  1 

D  F/F  BANK 


o  LEVELS  3  THROUGH  LEVELS  8 
o  ARE  DUPLICATIONS  OF  LEVEL  2 


Figure  4.9  Custom  16  Bit  Pipelined  Multiplier  Block  Design 
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Eleven  levels  of  blocks  were  used  to  construct  the 
multiplier.  These  were  then  attached  to  a  chip,  which  also 
included  input /output  pads,  clock,  power,  and  ground.  The 
chip  was  then  f loorplanned ,  and  the  floorplan,  with  pads,  i 
shown  in  Figure  4.10. 

The  chip  was  tested  for  correct  logic  using  the 
system  simulation  feature.  Ten  to  fifteen  random  16  bit 
unsigned  integers  were  inserted  on  the  input  signals. 
Although  the  tests  run  were  not  all  inclusive,  the  results 
indicated  correct  logic  for  the  inputs  tested. 
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Timing  analysis  was  performed  by  the  system.  Timinr 
Analyser  fcr  output  propagation  delays  at  each  level.  The 
results  are  presented  in  Table  XIII. 

I 


TABLE  XIII 

16  BIT  PIPELINED  MULTIPLIER  OUTPUT  PROPAGATION  DELAYS 


LEVEL 

OUTPUT  PROPAGATION 
DELAYS  (ns) 

MIN 

MAX 

1 

4.7 

5.8 

i 

1  2-8 

7 . 2 

10.4  * 

i 

1  9 

7.9 

8.1  j 

I 

10 

7 . 9 

8.1 

»  1  1 

7.9 

_ 

8.1 

_ 1 

The  data  indicate  that  the  longest  delay  in  the  circuit  is 
1C. 4  ns.  occurring  in  each  level  2  through  8. 


E.  '0EPr^?’i-rrE  RESULTS 

I 

Table  XIV  is  a  summary  of  the  performance  results  of  the 
multipliers  designed  and  constructed  in  this  chapter. 
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TABLE  XIV 

MULTIPLIER  PERFORMANCE  RESULTS 


MAX  DELAY 
(ns ) 

— 

NO.  OF 
STAGES 

CLOCK  RATE 
(MHZ) 

4  BIT  GENESIL  (WITH  LATCH) 

10.9 

1 

57.4 

8  BIT  GENESIL  (WITH  LATCH) 

24 . 4 

1 

32.3 

16  BIT  GENESIL  (WITH  LATCH) 

51.0 

1 

17.3 

4  BIT  WALLACE  (PIPELINED) 

warn 

1 

70.9 

16  BIT  PARALLEL  (PIPELINED) 

10.4 

8 

59.1 

_ 

The  data  clearly  illustrate  the  performance  advantage  gained 
by  using  the  custom  pipelined  multipliers  for  high  performance 
tasks. 
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V. 


CONCLUSIONS 


A.  SUMMARY 

This  thesis  has  described  the  applications  of  silicon 
compilers,  and  the  design  methodology  of  the  Genesil  Silicon 
Compiler.  The  Genesil  Silicon  Compiler  methodology  was 
demonstrated  with  the  design  and  verification  of  custom 
pipelined  adder  and  multipler  circuits. 

The  Genesil  Silicon  Compiler  system  is  a  rapid  and 
efficient  stand-alone  tool  for  algorithm  to  hardware 
implementation  and  verification.  Rapid  iterative  design 
simulation,  and  timing  analysis  is  possible  because  the  syste 
requires  no  user  initiated  programming. 

The  Genesil  system  user's  manuals  state  that  it  is 
assumed  the  user  has  attended  the  Genesil  Silicon  Compiler 
user  school.  The  manuals  are  reference  manuals,  and  net 
tutorials  for  new  users.  The  new  user,  however,  car.  rapidly 
learn  the  system. 

The  user  should  thoroughly  pre-plan  design  and 
performance  specifications  because  there  is  not  a  plot  "scree 
dump"  capability  on  the  system.  The  user  must  manually  track 
and  record  all  signal  and  object  changes  if  an  updated  design 
plot  is  desired  at  the  end  of  a  session. 

Object  compiling  and  channel  routing  times  for  the 
circuits  designed  in  this  thesis  were  longer  than  anticipated 
In  order  to  expedite  object  compiling  during  design  iteration 
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and  de-bugginc,  object  sizes  (i.e.,  blocks,  modules)  should  be 
as  small  as  practicable.  The  system  auto  placement  and 
routine  features  decreased  routing  times,  but  were  less 
efficient  than  manual  placement  for  overall  object  size. 

Complete  chips,  with  all  associated  hardware,  consumed 
much  system  memory.  During  thesis  research,  chips  and  objects 
were  stored  on  tape  backups  when  memory  availability  became 
critical.  All  design  and  performance  specifications  can  be 
verified  at  the  block  and  module  levels,  which  saves  memory 
and  routing  time. 


B.  RECOMMENDATIONS 

The  following  recommendations  should  be  considered: 

1.  Research  the  area  of  optimum  chip  test  algorithms 
prior  to  foundary  tapeout  .  Investigate  the  full  Genesil 
Compiler  System  Corporation's  capabilities  in  the  test  area. 


2.  Purchase 

a 

plotter  for  plot  "screen  dumps” 

for  rapid 

mediate  desi 

gn 

schematics . 

3.  Transfer  the  system  to  the  VAX  785  for  more  memory 
capability  and  faster  tape  storage  capabilities. 

4.  Following  system  transfer  to  the  VAX  785,  establish  a 
user  custom,  library  for  high  performance  modules  including 
pipelined  integer  multipliers,  floating  point  multipliers, 
signed  multipliers,  and  adders. 

5.  Do  design,  layout,  simulation  and  Timing  Analysis 
without  pads  for  memory  and  routing  time  efficiency. 


APPENDIX 

GENES IL  SILICON  COMPILER  TUTORIAL 

A.  INTRODUCTION 

The  purpose  of  this  tutorial  is  to  guide  the  new  user 
through  the  mechanics  of  a  Genesil  system  hierarchical  top- 
down  chip  design.  Designs  may  be  implemented  either  top-down 
or  bottom  up.  The  tutorial  begins  with  designing  two  basic 
blocks,  followed  by  a  multiplier  module,  and  summarized  with 
the  design  of  a  chip  which  uses  the  two  blocks  as  its  core. 

Prior  to  beginning  the  initial  session,  the  user  should 
become  familiar  with  the  System  Description  Users  Manual,  in 
particular.  Chapters  2  and  3.  The  next  manual  of  interest  is 
the  System.  Description  Application  Commands  manual,  which 
contains  detailed  explanations  of  user  invoked  commands.  The 
new  user  should  periodically  refer  to  Appendix  A  (Genesil 
System  Menu  Map'  of  the  System  Description  Application 
Commands  manual  during  initial  sessions. 

1.  Design  Method 

All  design  and  performance  specifications  should  be 
pre-planned,  including  a  detailed  sketch  with  all  signal 
names.  The  basic  stand-alone  object  which  can  be  attached  to 
a  chip  is  the  block.  Blocks  may  be  attached  to  modules  or 
chips ,  but  not  to  other  blocks.  Modules ,  the  intermediate 
object  in  the  hierarchy,  may  be  attached  to  other  modules  and 
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chips.  There  are  no  chip  size  constraints  in  the  Genesil 
system,  although  this  is  dicated  by  the  selected  foundary. 
Laroe  designs  can,  therefore,  be  implemented  with  chipsets. 

2 .  Operating  System 

Genesil  runs  under  the  UNIX  operating  system.  The 
user  is  referred  to  the  UNIX  For  Genesil  Users  manual  for 
detailed  UNIX  pathnames  information.  The  pathname  is  the  full 
name  of  an  object  in  the  Genesil  system.  The  user  is  referred 
to  page  4.8  in  the  System  Description  Users  Manual  for  naming 
conventions  details. 


B.  TUT0RBLF_1  BLOC?'. 

This  section  contains  a  step-by-step  design  cf  a  block 
named  tutorblk_2  .  The  block  will  contain  two  random,  logic 
objects,  which  are  a  4  bit  adder  (AO) ,  and  5  bit  D  F/F  (TFF1' . 
The  pre-planned  schematic  of  the  block,  including  all  signal 
r.a_n\  s  is  shown  in  Figure  A.l. 


tutorblk  1 


Cir. 
a  [  3  :  0  ] 
b  [  3  :  0  ] 


AO 


1  AC [ 3 : 0 ) 


.  1 

AO  COUT  1 
_ i 

DFF1 

DFF1 [4:0] 

i 


Since  the  system  is  menu  driven,  it  is  important  to  have  a 
detailed  schematic  with  signal  names  clearly  marked.  The  same 
signal  name  cannot  both  enter  and  leave  the  same  object. 

All  commands  may  be  executed  by  typing  in  the  command 
next  to  the  prompt  followed  by  a  RETURN,  by  using  the  arrow 
buttons  located  on  the  upper  right  side  of  the  keyboard  to 
scroll  through  the  commands  followed  by  a  RETURN,  or  by  using 
the  MOUSE.  All  following  command  instructions  assume  the  user 
is  using  the  MOUSE.  The  instruction  select  SOME_THING ,  means 
use  the  mouse  to  move  the  cross-hairs  to  SOME_THING  and  press 
the  execute  button  (right  hand  button)  on  mouse. 

1.  While  in  the  Executive  menu  (upper  right  corner  of 
screen )  : 

a.  Following  LOGIN  and  GENESIL  entry,  select 

CONTINUE . 

b.  Select  SELECT_OBJECT  (Figure  A. 2).  This  is 
normally  always  the  initial  command  in  order  tc  attach  objects 
to  the  user  free. 

c.  Select  ATTACH  (Figure  A. 3),  followed  by  NEW 
(Figure  A. 41  since  this  block  is  the  intital  object. 

d.  Select  BLOCK  (Figure  A. 5)  since  this  is  the 
object  type  desired. 

e.  Next  type  in  tutorblk_l ,  at  the  prompt  followed 
by  a  <CR> .  This  is  now  the  name  of  a  new,  yet  to  be  defined, 
block,  as  indicated  by  the  successful  creation  statement  on 
the  screen  . 
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User  ^g  enset  1 1  e/set  1 1  e  Ei»cut»vi 

- Qanesil  Version  v7  0 - 


)  START  Genesil  job  ■'’g  ens  e  1 1 1  e  /  se  1 1  le/ tut  or  _c  h  i  p  on  aicrol 
)  Mon  Aug  22  21  08  53  1998 

> 

>  Genesil  (t«)  8yste*  Vtmon  v7.  0 

) 

>  Copyright  Silicon  Compiler  Systems  Corporation  1988 

> 

)  Licensed  M*t«r  lal  —  Program  Property  of  SCS  —  All  Rights  Rnarvafl 

) 

>  This  software  i*  protected  as  an  unpublished  work  and  the  copyright  notice 
)  does  not  imply  publication  This  software  contains  confidential  traoe 

)  secrets  of  Silicon  Compiler  Systems  Corporation  The  reproduc t l on, 

>  transfer  or  use  of  this  software  or  the  supporting  documentation  is 

)  governed  by  a  license  agreement  with  SCS,  and  the  software  shall  oe  used 

>  solely  in  accordance  with  such  agreement 
) 

)  RESTRICTED  RIGHTS  LEGEND 

) 

)  Use,  duplication  or  disclosure  by  the  Government 

is  subject  to  restriction*  as  set  forth  in 
subparagraph  <c)(l>(ii>  of  the  Rights  in 
)  Technical  Data  and  Computer  Software  clause 


) 

CONTINUE 

ExI7_GEnES!l 

cancel 

•  t  252  227-7013 

'NSERr  MESSAGES  GRAPHICS 

OVERlAV 

RECORD 

utility 

EXIT  _OEnES II  SElECT_OBJECT 

DEFINITION 

PaCkaG£_EDI7 

,  Figure  A. 2 
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User  "-gense  tt  le/*ettle  E*tcutiv« 

- Oenesil  Version  v7.  O— ~ — - 

START  Genesil  job  ^gensettle/settle/tutor^h  ip  on  microl 
/  Mon  Aug  22  21  08 .53  1988 

> 

)  Genesil  (trn)  Bystem  Version  v7.  0 
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)  Copyright  Silicon  Compiler  Systems  Corporation  1988 

) 

)  Lictntid  Materiel  —  Program  Property  of  SCS  —  All  Right*  Reserved 

> 

)  This  software  is  protoctod  at  an  unpublished  work  and  the  tooyriant  notici 

>  does  not  imply  publication  This  software  contain*  confidential  tr*o* 

)  secrets  of  Silicon  Compiler  System*  Corporation  The  reproduction 

>  trar.tfer  or  use  of  this  loftwart  or  the  supporting  o  oc  umen  t « t  i  or  ii 
governed  by  a  license  agreement  with  SCS-  and  the  software  shall  be  used 

)  solely  in  accordance  with  such  agreement 
) 

)  RESTRICTED  RIGHTS  lECEND 

) 

>  Use.  duplication  or  disclosure  by  the  Government 

>  is  subject  to  restriction*  as  set  forth  in 

i  subparagraph  <c>il><ii)  of  tne  Rights  in 

>  Technical  Data  and  Computer  Software  clause 

>  at  252  227-7013 


' INTINUE 
-IT  GENESIL 
CANCEL 

5ELECT_0BJECT 

attach 

NEW 

B>_CX> 

tytc-fc 1 V_1 

>  Successful  Creation  of  ‘‘■gensettle/settle-’  tutorblk_l 


INSERT  MESSAGES  GRAPHICS  OvERLAr  RECORD  UTIlIT> 


EACv  UP  ACCOUNT  attach  rename 

CANCE^  DOWN  PATH  DETACH  DISPlAv 

SIDEWAYS 


SElECT  0&JECt  Option 
: SE^EC T_06 JECT> 

Figure  A. 3 
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) 

)  This  software  is  protected  as  an  unpublished  work  and  the  copyright  notice 
)  does  not  imply  publication  This  software  contains  confidential  trace 
)  secrets  of  5ilicon  Compiler  Systems  Corporation  The  reproduction. 

>  transfer  or  utr  of  this  software  or  the  supporting  documentation  is 

>  governed  by  a  license  agreement  with  SCS.  and  the  software  shall  be  used 

>  solely  in  acco^oancj  with  such  agreement 
) 

>  RESTRICTED  rights  legend 
) 

)  Use.  duplication  or  disclosure  by  the  Government 

)  is  subject  to  restrictions  as  set  forth  in 

>  subparagraph  icHImii)  of  the  Rights  in 

'  Technical  Data  and  Computer  Software  clause 

at  252  227-7013 

) 

) 

C0N7INJE 

e*:t_cenesil 

cancel 

5ELECTJDB  JECT 

attach 


INSERT  MESSAGES  graphics  overlay  RECORD  UT I _ I T ' 


cancel  existing 

NEW 


Enter  aTtaCh  Option 
/SEi_EC  t_ob jec T>ATTACH> 

Figure  A.u 
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Cenesil  (to)  System  Version  v7.  0 

Copyright  Silicon  Compiler  Systems  Corporation  1988 
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secrets  of  Silicon  Compiler  Systems  Corporation  The  reproduction 
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RESTRICTED  RIGHTS  LEGEND 
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Technical  Data  and  Compute*-  Software  clause 
at  252  227-7013 


CONT InjE 

£x 1 T  _GENEE I L 

CANCEL 

SELECT  JDBJECT 

A~TACh 

NEw 


INSERT  MESSAGES  GRAPHICS  OUERlAv  RECORD  UTIlITy 


CANCEu  BLOCK  CEnERAu_MODULE  CHIP  ChIP_SET 

PARALLEL_DP 

random_log ic 


aTTA'h  New  Object  Type 

:  select_object>attach>neu> 


Figure  A. 5 
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f.  Select  BACK,  which  returns  the  user  tc  the  rain 
Executive  menu. 

g.  Now  select  SELECT_OB JECT ,  and  on  the  next  screen 
select  DOWN.  The  next  screen  should  indicate  a  list  of  sub- 
objects  on  the  right  side  of  the  screen  (Figure  A. 6).  Select 
tutorblk_l  and  it  will  now  be  attached  to  the  tree. 

h.  Now  go  BACK  to  the  initial  Executive  menu  (Figure 

A. 2)  . 

2.  The  block  now  needs  to  be  defined: 

a.  Select  DEFINITION  (Figure  A. 2).  The  next  screen 
is  the  initial  Definition  r.er.u  (Figure  A. 7)  as  denoted  by  the 
upper  right  hand  corner  cf  the  screen.  The  upper  left  hand 
corner  indicates  the  object  types  and  pathnar.es. 

b.  Select  HEADER  (Figure  A. 7).  The  next  screen 

«  (Figure  A. 8)  is  hie  Header  frrr.  Select  RAND0M_10GIC  under 

Function  type.  CONFIRM  it.  then  select  VTC_CP10E  under  Fab 
1  i  r.  o 

c.  Next  ACCEPT_FORM  (Figure  A. 8)  which  will  return 
the  screen  tc  the  Definition  rent .  Now  select  SPECIFICATION 
which  roves  the  screen  to  the  RANDOM  LOGIC  Functional 
Specification  form  (Figure  A. 9). 

d.  Select  NEW  (Figure  A. 9),  and  a  random  logic 
library  pops  up  on  the  right  side  of  the  screen.  Select  A^DER 
and  DFF  from  the  logic  library. 
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e.  Now  select  BACK  and  the  screen  should  be  bach  ir. 
the  RANDOM  LOGIC  Functional  Specif ication  form  (Figure  A. 10' 
with  the  ADDER  and  DFF  included. 

f.  Select  EDIT,  which  is  adjacent  to  ADDERO  (Figure 
A. 10)  and  the  next  screen  will  be  a  specification  form  for  the 
adder  (Figure  A. 11).  Details  of  all  random  logic 
specification  forms  are  found  in  the  Genesil  Silicon  Compiler 
Library  Vol  I,  Blocks. 

g.  Fill  in  the  adder  specification  form  as  shown  in 
Figure  A. 12.  Select  EXPAND  for  a  line-by-line  entry  form  if 
desired  and  select  COMPRESS  to  return. 

h.  Select  NEXT  (Figure  A. 12),  which  pulls  up  a 
specification  menu  for  the  DFF.  Fill  it  out  as  shown  ir. 

Figure  A. 13. 

i.  Now  select  BACK  tc  return  to  the  RANDOM  LOGIC 
Functional  Specification  form  (Figure  A. 10). 

j.  Select  SIGNALS  (Fiaure  A. 10)  and  the  screen  shows 
a  signal  list  of  the  block.  Make  the  signals  correspond  to 
Figure  A. 14  by  selecting  I.  0  and  L  next  to  the  signal  names. 
This  cleans  up  the  circuit  because  the  system  assumes  the  user 
desires  Both  normally. 

k.  Now  select  BACK  to  return  to  Figure  A. 10. 

l.  If  desired,  VIEW  may  now  be  selected  for  a  block 
diagram,  with  signals,  for  inspection.  Use  BACK  to  return  to 
the  specification  form. 
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m.  Now  select  ACCEPT_FOP.M ,  and  after  the  system 
writes  the  text  file  and  validates  the  forr.,  the  screen  should 
be  back  to  the  Executive  menu  (Figure  A. 15). 

3.  The  block  will  now  be  compiled: 

a.  From  Figure  A. 15  select  COMPILE.  The  next  screen 
gives  the  user  various  compile  options  including  simulation, 
timing  analysis,  and  layout.  Since  all  will  be  used  later, 
select  BUILD_ALL  (Figure  A. 16). 

b.  At  completion  of  compile,  the  screen  will  again, 
return  to  the  Executive  menu  (Figure  A. 17). 

4.  The  block  will  now  be  simulated.  Detailed  simulation 
information  can  be  found  in  the  Simulation  Users  Guide: 

a.  Select  SIMULATION  (Figure  A. 175. 

b.  Select  GFL  then.  SIMULATE  on  the  Simulation 
Environment  form  (Figure  A. 18'  .  Notice  the  system  is  new  in 
the  Functional  Simulator  (upper  right  hand  corner  of  screen'. 

c.  Select  BIND  (Figure  A. 19)  to  input  signal  values 

d.  Select  MULTIPLE_SIGS  (Figure  A. 20)  since  there 
are  several  signals  to  input. 

e.  Type  in  a[0]  etc.,  and  the  value  (0  or  1)  as 
prompted.  Use  the  values  shown  in  Figure  A. 20  as  initial 
examples . 

f.  When  all  desired  signal  values  have  been  entered, 
select  BACK  (Figure  A. 20). 

g.  Now  select  CYCLE  and  2  (Figure  A. 21)  to  cycle  the 
system  clocks  twice. 
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h.  Type  in  pi  and  depress  RETURN  (Figure  A. 21). 

i.  Now  the  screen  should  look  like  Figure  A. 22. 
which  has  simulated  the  block  for  proper  logic.  Other  values 
may  be  inserted  by  using  the  previous  steps  beginning  with  c. 

j.  Now  select  BACK,  then  EXIT_SIM,  with  a  CONFIRM  to 
return  to  the  Executive  menu  (Figure  A. 17) 

5.  Timing  analysis  will  now  be  performed  on  the  block. 
The  Timing  Analysis  Users  Guide  contains  detailed  information 
concerning  timing  data  and  commands: 

a.  Select  TIMING  (Figure  A. 17). 

b.  The  system,  should  be  in  the  Timing  Analyser 
function  as  shown  in  Figure  A. 23. 

c.  Select  CLOCKS,  and  all  object  clock  information 
is  as  shown  in  Figure  A. 24. 

d.  Select  BACK  (  Figure  A. 24). 

e.  Select  PATK_DE1AY  (Figure  A. 231.  The  screen  new 
shows  a  list  of  all  user  generated  nodes  or  signals  (Figure 
A. 25).  By  selecting  source  and  destination  signals  from,  the 
list,  the  system,  calculates  logic  propagation  delays  between 
the  selected  nodes  (Figure  A. 26).  This  is  where  the  detailed 
schematic  may  be  useful. 

f.  Select  BACK  (Figure  A. 26)  to  get  to  Figure  A. 23, 
then  BACK  to  exit  timing,  followed  by  CONFIRM. 
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6.  Now  to  exit  Genesil: 

a.  Select  EXIT_GENESIL  (Figure  A. 15). 

b.  Select  CONFIRM. 

c.  Select  appopriate  log  command.  The  block  is 
stored  in  the  user's  account  regardless  of  which  log  command 
is  selected.  It  is  best  to  not  save  the  log  in  the  interest 
of  memory,  unless  a  future  printout  is  desired. 

C.  TUT0RBLK_2  BLOCK 

This  section  is  a  user  exercise  to  build  a  block  named 
tutorblk__2  by  following  the  steps  illustrated  in  section  B. 
This  block  is  necessary  for  the  completion  of  the  chip  in 
section  E. 

The  block  will  contain  six  random  logic  objects, 
consisting  of  5  inverters  (i0-i4)  and  a  5  bit  D  F/F  (DFF5 ) . 
The  pre-planned  schematic  of  the  block,  including  ajl 
necessary  sicnr.l  names  is  shown  in  Figure  A.  27. 
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tutorblk . 2 


Figure  A. 27  Tutorblk_2 


To  ensure  that  the  block  functions  properly  and  will 
connect  properly  to  the  chip,  make  the  specification  menus 
match  Figure  A. 28  through  A. 32.  Then  compile,  simulate,  and 
perform  timing  analysis  as  in  section  B. 
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D .  MULT_MOD  MODULE 

This  section  illustrates  the  design  of  a  4  bit  multipli 
module.  The  detailed  schematic  is  shown  in  Figure  A. 33. 


MULT  MOD 


Figure  A. 33  Mult_Mod  Module 

Library_mult  is  a  Genesil  system  library  parallel  multiplier 
block.  The  external  adder  is  necessary  to  complete  the 
multiplication.  Detailed  information  concerning  the 
multiplier  block  is  contained  in  the  Genesil  Silicon  Compiler 
Library  Volume  I,  Blocks. 

1.  The  user  should  proceed  as  in  the  previous  sections 
up  through  ATTACH  NEW  (Figure  A. 5): 


a.  Now  select  GENERAL  MODULE  (Figure  A. 5). 


b.  Go  BACK,  name  the  module  Mult_Mod,  by  typing  it 
in  next  to  the  pror.pt.  and  attach  it  to  the  tree  as  in  the 
previous  sections. 

c.  Select  DEFINITION  from  the  Executive  menu. 

d.  Select  HEADER  (Figure  A. 34),  then  VTC_CP10B  for 
Fab  line,  and  ACCEPT_FORM . 

e.  Now  select  SPECIFICATION  to  pull  up  the  Module 
specification  menu.  No  objects  are  on  the  menu  yet. 
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will  now  be  prompted  for  a  name.  Name  the  first  random  logic 
object  adder_3bit . 

c .  Repeat  for  dff_8bit. 

h.  Now  select  ATTACK_NEW .  then  select  BLOCK,  and 
name  it  lib.rary_r.ult  when  prompted.  When  complete,  the  screen 
should  lock  like  Ficure  A. 35. 

i.  Each  sub_cb:ect  (adder_3bit.  dff_8bit,  and 
library_rul t ’  rust  now  be  defined. 

- .  Starting  with  adder_3bit,  select  DEFINE  (Ficure 
A. 35).  Then,  select  HEADER  and  specify  RANDOM_LOGIC  for  object 
type  on  the  Header  form.  Fab  line  car.  be  specified,  but  will 
be  automatically  taken  care  of  at  the  module  level.  Select 
ACCEPT_FORK. 

k.  Next  select  SPECIFICATION  which  pulls  up  a  RANDOM 
LOGIC  Functional  Specification  Menu.  Select  NEW,  then  ADDER 
from  the  menu  provided. 
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Figure  A. 34 


125 


Module  ‘'•peniettle/eettle/tutorial^mod 

- - - - - - - — Ccncfil  Vtmon  v7  0- 

Sub-Object*  Nam#  Type 


Definition 


>«dd#r_3b i t _ 

>df f _8b i t _ 

>1  ib  r«r  y^mw 1 1_ 


ATTACH_NEW 


Random_Lofl icm 
Rand  om_Lofl i c „ 
MULTIPLIER 


DEFINE  DETACH 
DEFINE  DETACH 
define  DETACH 


ATTACH  EXISTING 


126 


1.  Go  BACK  to  the  Specif ication  menu  and  it  should 
look  like  Figure  A. 36. 

r. .  Now  select  EDIT,  and  fill  out  the  Random  Logic 
Block  Specification  menu  as  shown  in  Figure  A. 37.  Select 
EXPAND  and  the  menu  should  look  like  Figure  A. 38. 

n.  Starting  at  step  j,  follow  the  same  procedures 
with  dff_8bit.  The  RANDOM  LOGIC  Functional  Specification  form 
(Figure  A. 39)  and  Random  Logic  Block  Specification  form 
(Figure  A. 40)  should  be  filled  out  as  shown. 

o.  Proceed  the  same  way  for  library_mult .  but 
remember  to  select  multiplier  on  the  Header  form. 

p.  Fill  out  the  MULTIPLIER  SPECIFICATION  menu  as 
shown  in  Figure  A. 41. 

c.  Select  ACCEPT_FORM  (Figure  A. 41).  then  BACF  to 
the  Module  Specification  Form  (Figure  A. 35). 

2.  The  module  must  now  be  netlisted: 

a.  Select  OB  J  E  C  T_NETL 1ST  (Figure  A. 35). 

b.  Type  in  adder_3bit  (Figure  A. 42)  next  to  Object 
Name.  Another  way  to  do  this  is  to  mouse  Object  Name,  depress 
Return,  and  a  list  cf  sub-Objects  is  pulled  up  on  the  right 
side  of  the  screen  for  selection. 

c.  Proceed  through  the  Net  Name  list,  and  select  E. 
next  to  attributes,  for  external  for  all  signals .  This  is 
necessary  to  make  the  module  function  correctly! 

d.  Select  Object  Name,  and  type  in  dff_8bit  (Figure 
A. 42).  Do  the  same  as  in  c  above. 
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Figure  A. 36 
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Figure  A. 31 
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e.  Select  Object  Nase,  and  type  in  library_r.ult 
(Figure  A. 44'.  Do  the  same  as  in  c  above. 

f.  Now  select  SPECIFICATION  (Figure  A. 44)  to  check 
the  netlist.  The  system  will  state  if  valid  or  advise  of 
errors  which  are  listed  by  using  VIEW_DRC_NETLIST . 

3.  The  module  must  now  be  f loorplanned .  Simulation  and 
timing  analysis  cannot  be  performed  on  a  module  prior  to 
netlisting  and  floorplanning. 

a.  After  netlist  validation,  the  screen  should  show 
a  module  Definition  menu.  If  not,  go  BACK  or  ACCEPT_FORM  to 
return  to  the  Definition  menu.  Select  FLOOR_PLAN. 

b.  Figure  A. 45  should  now  be  on  the  screen .  Select 
PLACEMENT  (Figure  A. 45). 

c.  Next  select  each  unplaced  block  (Figure  A. 46) 
until  all  are  placed.  Blocks  may  be  moved  by  hooking  the 
object  with  the  MOUSE.  The  rest  of  the  commands  are  explained 
in  detail  in  Chapter  6  of  the  System.  Description  Applications 
Commands  manual . 

d.  Next  go  BACK  (Figure  A. 46)  to  Figure  A. 45. 

e.  Select  PINOUT  (Figure  A. 45). 

f.  Now  select  AUTO_PINOUT  (Figure  A. 47). 

g.  Go  BACK  (Figure  A. 47)  to  Figure  A. 45. 

h.  Select  FUSION  (Figure  A. 45). 

i.  Select  AUTO_FUSE  (Figure  A. 48). 

j.  Now  go  BACK  (Figure  A. 48)  to  A. 45. 
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k.  Select  DONE  (Figure  A. 45),  followed  by  CONFIRM. 
If  all  goes  well,  the  floorplan  will  be  complete. 

4.  Now  simulation  and  timing  analysis  can  be  performed 
as  described  in  the  previous  sections. 

E.  TUTOR_CHIP  CHIP 

This  section  is  a  tutorial  for  a  top-down  chip  design  of 
a  chip  named  tutor_chip.  The  chip  consists  of  tutorblk_l  and 
tutorblk-2 ,  a  clock,  input/out  pads,  a  VSS  pad,  and  VDD  pad. 

A  schematic  with  signal  names  is  shown  in  Figure  A. 49. 


Figure  A. 49  Tutor_Chip  Chip 
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1.  Definition  and  Specification: 

a.  Proceed  as  in  the  previous  sections  down  through 
ATTACH  NEW.  Now  select  CHIP  from  the  Executive  menu  (Figure 
A. 50)  . 

b.  Name  the  chip  tutor_chip  when  prompted,  and 
attach  it  to  the  tree. 

c.  Select  DEFINITION  and  HEADER.  On  the  Chip  Header 
form  (Figure  A. 51)  select  Fab  line  VTC_CP10A  and  leave  the 
Package  Type  as  NO_PACKAGE. 

d.  Select  ACCEPT_FORM  (Figure  A. 51). 

e.  Now  select  SPECIFICATION  from  the  Definition  menu 
and  the  screen  should  show  a  blank  Chip  Specification  form 
(Figure  A. 52) . 

f.  Next  select  ATTACH_NEW  (Figure  A. 52)  and  BLOCK 
(Figure  A. 53).  Name  the  object  data_in  when  prompted. 

a  Just  like  the  module  each  sub-object  on  the  chip 
must  be  defined  by  using  the  appropriate  Header  an'* 
Specification  forms. 

h.  Select  DEFINE  next  to  data_in  on  the  Chip 
Specification  menu. 

i.  Select  HEADER  from  the  next  screen.  Select  PAD 
from  the  Header  form  for  Function  type  (Figure  A. 54)  and 
CONFIRM.  Select  ACCEPT_FORM . 

j.  Next  select  SPECIFICATION  and  the  screen  should 
show  a  PAD  Functional  Specification  menu  (Figure  A. 55). 
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Select  IN  for  Data  Flow.  All  other  specifications  are 


defaults  except  width.  Phase  B,  and  Data  In.  Fill  these  to 
match  Figure  A. 55. 

k.  Now  select  ACCEPT_FORM  (Figure  A. 55).  Go  BACK  to 
the  Chip  Specification  form  (Figure  A. 56).  Figure  A. 56  is  a 
complete  Chip  Specification  form  for  tutor_chip.  Your  Chip 
Specification  form  should  have  only  a  data-in  PAD  on  it  at 
this  time. 

l.  For  each  of  the  remaining  PADS  (clock,  vss,  vdd, 
and  data_out),  using  steps  f  through  j  except  for  the 
following  PAD  Functional  Specification  form  changes: 

(1)  Clock  PAD:  Select  CLOCK  for  Pad  type.  All 
other  specifications  are  defaults  except  PHASE_A  and  PF.ASE_B  . 
type  in  phase_a,  and  phase_b  in  accordance  with  Figure  A. 57. 
ACCSPT_FORM  (Figure  A. 57)  and  continue  to  the  next  PAD. 

(2)  VSS  PAD:  Select  VES  for  Pad  Type.  All 
other  inputs  are  left  to  defaults.  ACCEPT_FOP.M  (Figure  A.  55) 
and  continue  to  the  next  PAD.  The  Chip  Specif icaticn  form 
should  now  look  like  Figure  A. 59.  The  order  is  not  important. 

(3)  Vdd  PAD:  Select  vdd  for  PAD  type.  All 
other  inputs  are  left  to  defaults.  ACCEPT_FORM  (Figure  A. 60) 
and  continue  to  the  next  PAD. 

(4)  data_out  PAD:  Select  OUT  for  Data  Flow. 

All  other  specifications  are  defaults  except  width.  Phase  A, 
Phase  B,  and  Data  Out.  Fill  these  in  to  match  Figure  A. 61. 
ACCEPTJFOP.M  and  go  BACK  to  the  Chip  Specification  form. 
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in.  Froir.  the  Chip  Specification  form,  select 
ATTACH_EXI STING.  The  user  will  now  be  prompted  for  the  path 
to  the  existing  object.  Type  in  "gensettle/settle/tutorblk_l 
and  depress  RETURN  (use  your  own  name  and  path  unless  you  are 
in  Settle’s  account).  Now  select  DEFAULT_TO_CURRENT_NAME . 

The  Chip  Specification  form  should  now  include  tutorblk_l. 

n.  Attach  tutorblk_2  in  the  same  manner.  The  Chip 
Specification  form  should  now  contain  the  same  objects  as 
Figure  A. 56. 

2.  The  Chip  must  now  be  netlisted: 

?.  Select  OP JFCT_NETLTST  (Figure  A. 56)  and  ■* 

e-.'.v  object’s  netlist,  as  described  in  the  module  section  and 
ensure  all  object  netlists  match  Figure  A. 62  -  A. 67.  Type  in 
the  required  signal  names  where  applicable,  and  depress  RETURN 
at  the  end  of  each  line. 

b.  Select  SPECIFICATION  (Figure  A. 67),  and  if  there 
are  no  netlist  errors,  the  Chip  Specification  form  should  be 
on  the  screen  (Figure  A.  56)'.  Now  select  ACCEPT_FORM  (Figure 
A. 56),  and  the  screen  will  now  show  the  Definition  menu  as 
illustrated  in  Figure  A. 68. 

3.  The  Chip  must  now  be  f loorplanned : 

a.  Select  FLOORPLAN  (Figure  A. 68). 

b.  Use  the  module  placement  procedures  to  place  the 
two  unplaced  blocks  (Figure  A. 69). 

c.  Go  BACK  to  the  initial  floorplan  menu  (Figure 
A. 70).  Select  PINOUT.  All  unplaced  PADS  are  now  listed. 
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Chip  “’geneett  2  e/*ettl  e/tutor  _ch  ip 

- - - - Gene  * 1 1  Version  v7  0 - - - 

>  This  object  is  producible  in  *11  current  tecnnologies 
Form  is  valid 
wACK 

OBJECT^NETLIST 
«_NULL,  QBJ_NAME  “ " 

‘  da  ta_m 

»_L1  0.  NET_NAME  "DATAIhl^PADSC 7  03“ 

«3NULL-  OB  J^hlAME  “data-in" 

' clock 

•  _L2  0.  NET_NAHE  "CLK_PADm 
*_N«JLL.  OBjInaHE  “clock" 

'vis 

*_NUi-L.  GBJ_NArlE  “vss“ 

'  tut or  p ; fr_J 

t_NULL.  0Bu_NAME  “ t u t or b 1 k _1 “ 

'  tutor b  1  \t  J2 

*_NULL.  0BJ_NAME  “ t u t or b 1 k _2“ 

'  vd d 

»_N  0BJ_NAME  "vod" 

da t i_ou t 

DATA0aT_PADSC4  03 

#  _N>JLL .  OB  JONAHS 
C«ti_OUt 

«_L1  0,  NET_NA«E  "DATA0UTC4  03" 

•  _L1  1.  NETJmahE  “FAlSE" 

*  NUl.L;  *J_7  2.  NET  NAHE  ‘df  *514  03" 

'ANCEL 

HECk^SPEC 

i_DR_CHECK 

)  No  loop  is  detected 

’  netlist  is  valid 

SPECIFICATION 

)  Netlist  is  stored 

)  Kei*  Para/Deters  (set  120)  Modified 

ACC£PT_rQRr- 


INSERT 

MESSAGES  GRAPHICS 

uVERi^AV 

RECORD 

UTIlITy 

EACk 

HEADER 

SPECIFICATION 

NET_nETl!ST  FlOORPi_An 

QB  JEC T  ^nE"' l  I  ST  CuRR  EnC  v  F 

TE-1 
TE  ~ 

_s°ec_read 

__SdEC_HR  1 TE 

:  DE~  INI1  ION. 

Figure  A. 6 6 
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Chip.  'vgensettle/*ettle/totor_chip 

- - - - - — Cents  ll  Version  v7  0- 

Degree  of  Freedom'  MOVE  t  ON/ OFF  )  RQTATE<  On/QFF  ) 


JISPLAY  VECTOR l  ON/OFF  ) 


MODE  Edge_Densl tg (  ON/OF 


Floorp Ion 

Unp  l«ced_b lock 
tutorb 1  k_l 
tutorfclfc _2 


e 


INSERT  ME 5 

SA\»ES  CRaPm  1 C  5 

OVERlAy 

SEC  uR  L1 

EaC* 

BEST  p^aCE 

CENTER 

rj  t 

RESE"  °  L  AC  E 

list  beet 

pan 

IDE! 

TE > ~  E-EC 

CHECk  SPEC 

scale 

AUTu_P1_aCEmEnT 

ZOOM 

E-tf"  olork  to  PLACEtftr}  . 

'  Z'£~  If.  I  t  IGn>FlOORF*-an> 

Figure  A. 69 


« 


« 


i 
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Chip 


%g«ni»ttl*/iittU/tutervchip 
- Conotil  Version  v7.  O- 


Floor p 1  on 


IN5ER  T  MESSAGES  GRAPHICS 


OVERLAY  RECORD  UTILITY 


done  Placement 

CANCEl  pinout 

FUSION 


Corr*manc 

'.'DEE  I N I T  I  on:  r  loorrlan:- 

Figure  A, 70 
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Select  each  PAD  with  the  MOUSE.  Place  the  cross  hairs  around 
the  chip  in  the  desired  PAD  location.  Click  the  right  MOUSE 
button.  After  placing  a  PAD,  it  may  be  moved  by  hooking  it 
with  the  MOUSE,  and  moving  it  to  the  desired  new  location. 

d.  After  placing  all  PADS,  go  BACK  to  the  initial 
floorplan  menu  (Figure  A. 70)  and  select  FUSION. 

e.  Select  AUTO_FUSE.  Go  BACK  to  the  initial 
floorplan  menu  (Figure  A. 70) . 

f.  Select  DONE  (Figure  A. 70).  If  all  pads  are 
placed  correctly,  the  system  will  confirm  floorplan  complete. 

g.  If  there  are  PAD  placement  errors,  try  moving  the 
PADS  around  and  select  DONE  again. 

4.  After  the  chip  is  netlisted  and  floorplanned,  it  may 
be  simulated  and  timing  analysis  performed  in  accordance  with 
the  previous  block  and  module  instructions. 

5.  The  chip  may  be  plotted  using  the  following  commands: 

a.  Select  PLOT  (Figure  A. 71) . 

b.  Next  select  NEW_PLOT  (Figure  A. 72). 

c.  Select  LAYOUT  for  a  VLSI  layout  of  the  chip 
(Figure  A. 73) . 

d.  Now  select  WORKSTATION  (Figure  A. 74) ,  and  then  GO 
(Figure  A. 75) . 

e.  If  all  things  go  well,  the  screen  should  show  a 
layout  similar  to  Figure  A. 76. 


167 


Eitcutivt 


•  »*»« 

Chip 


> 


‘'•gen  settle/settle/tut  or  _e  hip 

— - - - - - — — Otmsi  1  Vtruon  v7.  0 - 


>  C«p«c it«nct  for  'ph««c_« '  it  0  11  pf 
)  Capacitanca  for  phase_t>'  is  0  12  pf 
)  I  Poo k  AC  current  to  VSS  35763  oA 

>  Key  Parameters  250  transistors.  Dissipation:  17.4  mi  1 1  lUatt  t®5vt£iOmrw 
)  Key  Parameters  <set  124>  Modified 

)  Done  with  command  COMPILE  LOAD_MODEL  ————•••— •——in  Block  /aete_out 

>  Time*  real«40s<  cpu»l9s  (u»l2s.  s*6  8s>  (c»l9s> 

;  Eiecuting  command  COMPILE  LAYOUT  —••••-• — - - in  Chip  /tutor  chip 

) 

>  Fabline  VTC_CP10B  Technology  ChOS-1 
)  Package  not  found 


rqf r  eq02-U-I 

Clock 

net 

phase _a 

given 

to 

have 

a  maximum  frequency  u* 

10 

00  Mr 

r qf r eqOr-U-I 

dock 

net 

p  has  e__b 

given 

to 

have 

a  m  i  -  iduai  frequent,  or 

10 

o 

o 

3 

)  z 

)  RO'JTEP  maximum  ir-drop  voltage*  t»v)  found  USS  24  UCC  35 
)  ROUTEP  maximum  jr-drop  voltage*  <mv)  found  VSS  24  UCC  35 
)  Package  not  found 

>  Chip  size  in  urn  2265  i  2221.  in  mils  89  2  *  07  4 

>  Key  Parameter*  tset  121)  Modified 
)  Key  Parameters  (set  123)  Modified 

)  Done  with  command  COMPILE  LAYOUT - - - in  Chip  /  tutor  _cn  ip 

1  Times  rfal*ib6»-  cpu*ll9*  <u=97s.  s*2ls)  (c*ll9s) 

MP_Pi_OTTER 

B_SI2E^PAPER 

tutor 


•  PuOT  5TATI5TIC5  3034  vectors.  12651  rectangles.  5872  line  rectangle*.  14324 
>  skircea  rec  tang i es 
PuOTJ^E 
HP 7 530 a 
'  tutor 

)  Plot  is  queued 
BACK 


INSERT  MESSAGES  GRAPHICS 

OVERlAv 

RECORD  UTILITY 

EXI’_GEnESIL  SELECT_OBJECT 

DEFINITION 

compile 

tooling 

PACKAGE  EDIT 

simulation 

Pi_  OT 

timing 

TP anS^aTE 

Pigure  A. 71 
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Chip  “'•gensett  le/settl  e/tutor_ ch  ip  Plot 

- — • - — — - Genet il  V»r»ion  v7.  0 - * - 

t  Cipicitancc  for:  'ph4it_» '  it  0  11  pf. 

)  Capacitance  for:  'phast_b '  it  0  12  pf. 

)  1  Peak  AC  current  to  VSS  35763  uA 

>  hey  Parameters.  250  transistor*.  Dissipation:  17.4  mi  1 1 iWa tt *#5V#lOMn i 
)  Key  Parameters  (set  124)  Modified 

)  Done  with  command  COMPILE : LOAD_MOD€L  - in  Blocs:  /data_out 

)  Times  real*40t.  cpu*X9s  (o*12t.  «*6  8i)  (c*19tl 

>  Executing  command  COMPILE  LAYOUT - in  Chip  /tutorship 

> 

>  Faoline  VTC_CP10B  Technology:  CMOS-1 
)  Package  not  found 

)  rqf req02- u-I  Clock  net  phase_a  given  to  have  a  maiimum  frequency  of  10  00  Mo 

>  r 

>  rqfr«q02~U-l  Clock  net  phaie_b  given  to  have  a  maiimum  frequency  of  10  00  Mn 

)  z 

)  ROUTER  maximum  ir-drop  voltages  (mv)  found  VSS  24  VCC  35 

>  R0UTER  maximum  ir-drop  voltages  imv)  found  VSS  24  VCC  35 

■  >  Pec  k  a  j  e  not  found 

>  Chip  size  in  um  2265  x  2221.  in  milt  8R  2  »  87  4 
)  Key  Parameters  iset  12l >  Modified 

t  Key  Parameters  (set  123)  Modified 

i  Done  with  command  COMPILE  LAYOUT  - in  Chip  /tutor_chip 

i  T;mes  real*10osi  cpu*119%  (u*97s.  **21s>  (c“ll9s> 

HP  _Pl0^tEP 

C_Sr2E_PAPEP 

tutor 

r.C 

P_ 0r  STATISTICS  3034  vectors  12&51  rectangles.  5S72  line  rectangles.  14324 
■  s ►  l  c  o  *  o  r  e  c  t angles  # 

p»-07  rii_s 
Mc  75e : A 
t ut  _1 

Plot  is  queued 
£  AC 
Fi_£T 


insept  messages  graphics  overlay  record  utility 


Da: r  NEW_PuOT 

PluT^PIlE 


;  plot: 


Figure  A.  7 2 


Plot 


Chip  ‘vg*n*»tt  1  e/sett  le/tutor^ch  ip 

— - - — - - - - — -Genet i I  Vimon  v7.  0  ■  - - - — - 

)  Capacitance  #o r:  'phase_b'  it  0.  12  pf 

>  I:  Peek  AC  currtnt  to  VBS:  35763  uA 

)  Key  Parameters  250  tr«nmtori.  Dissipation:  17.4  mi 1 1 iWett st5VftlOHn z  . 

)  Key  Paramitirt  lift  124>  Modified 

)  Done  with  commend  C OMP I LE :  LO AD_ MODEL  - - ■ - in  Block:  /ditt^out 

)  Times  real*40s.  cpu*19s  (u«12«<  t“6.  Gt >  (c»19s> 

)  Executing  commend  COMPILE: LAYOUT  - in  Chip:  /tutor_chlp 

) 

)  Tab  line  VTC_CP10B  Technology:  CMOS-1 
)  Package  not  found 


r  qf  r  eq02rU- 1 

Clock 

net 

phais.a 

given 

to 

hava 

a  maximum  frequency 

of 

10 

00  Hn 

r  qf r  eq02-U- I 

Clock 

net 

phase_b 

g  i  ven 

to 

have 

a  maximum  frequency 

of 

10 

00  Mh 

>  2 

>  ROuTER  maximum  ir-drop  voltages  (mv>  found  VSS  24  V/CC  35 

>  ROUTER  meximurr  ir-drop  voltages  <mv)  found  VSS  24  VCC  35 
)  Peerage  not  found 

)  Chip  size  in  um  2265  *  2221.  in  mils  09  2  «  87  4 
)  key  Parameters  'set  121)  Modified 
)  Key  Parameters  (set  123)  Modified 

)  Done  with  command  COMPILE  LAYOUT - -  l  n  Chip  / t  u  t  o  r  _c  hip 

)  Times  reei«l86s.  cpu*ll9s  <u»97*.  s*2ls>  <c»119s> 
hp  plotter 
b_si ze_ paper 

tutor 

co 

I  Plot  STATISTICS  3004  vectors.  12651  rectangles.  5872  line  rectangles.  14324 
snipped  rectangles 
PlOT  tile 

HP7580A 

tuto**,! 

>  Plot  i *  queues 
BACK 

FlOT 

NEU_PlOT 


INSERT  MESSAGES  GRAPHICS 

OVERLAY  RECORD  utility 

EiaCk 

layout 

ROUTE 

PLOORPlAN 

PAPEF_DOlLS 

BONDInC_DI ACRAfl  DI EaBlE_CURRENC Y 

;plot:  new  plot;  .  . 

Figure  A 

1  _  .m 
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Plot 


Chip  *vgense  1 1 1 1  /«#  1 1 1  •/  tuter^ch  i  p 

- - - - - Gene  s 1 1  Version  v7  O* 

>  Times  real»40s.  cpu*19t  (o«l2s-  s»6  8s)  <c*19s> 

)  Eiecuting  command  COMP  1 LE :  LAYOUT - - — - - in  Chip  /tutor_chip  . 

% 

)  Fab  1  me  VTC_CP10B  Technology  CMOS-1 

>  Package  not  <ouno 

)  rqfreq02-U-I  Clock  not  phos»_o  given  to  have  o  aaiiaum  frequency  o f  10  00  Mh 
»  z 

/  rqf r eq02-U-I  Clock  opt  phase_b  given  to  hsv«  a  marimum  frequency  of  10  00  Mh 

i  i 

i  ROUTER  i  i  mum  ir-drop  voltages  (ifiv)  found:  VS5  24  VCC  35 

'•  POUTER  maiimum  ir-drop  voltages  (sv)  found:VSS  24  VCC  35 
)  Package  not  found 

•  Chip  si:?  in  urr.  22o>5  t  2221.  in  mils  89.2  i  87  4 
key  Parameters  i  sr  t  l2l>  Modified 

»  Kt^  Parameters  u»t  123)  Modified 

)  Done  with  command  COMPILE  LAYOUT  - - -  - - — -in  Chip  /  tutor  _c  hip 

)  Times  r«al*18dis-  cpu=119s  (u=97s,  s*21s.’  <c*119s> 

hp_p_ctter 

p_i: ZE^PAPER 
tuto' 

G  C' 

/  PLOT  5taTIST1CS  3034  vectors,  12o5j  rectangles,  5572  line  rectangles.  14324 
ifcicpeo  rec  rang l*i 
PuQT/Il£ 
hp? 560a 
■  tut  0- 

•  Pit*  ::  qjeued 
A  CP 

Plot 

lav  0 

C netting  *i if  currency 

>  Internal  Ob  '  ec  t  niere-chy  Inir  jalnea 

>  Completing  Date  Gatr. ering  Fnast 

•  All  files  «-e  up  to  date 


”  im£  K 


WORh.ST ATI  Q*v 
HP  _PLO“T£> 


: f  l0‘.  new_PuCt;  layout: 


Figure  A. 74 


i 


Chip  ^gensettle/settle/tutor^chip 

- - - C»npu  1  Version  v7. 

Seal*  Factor  51  42  Rotate  OFF 

Number  ol  page*  («  by  y>:  1  by  1 


Object  Size  (mile)  09  20  n  87  44 
Window  Size  unili) .  89.20  i  87  44 


Object  Limits  (mile)  (-44  60.-43  72>  (44  60-43  72) 
Window  Limits  (mile)  (-44  60.-43.72)  <44  60.43  72> 


Device  Co-ord  System  Size  10000  i  5840 
Viewport  Co-ord  System  Size  10000  i  5840 


Device  Co-ortf  System  Limit* 
Viewport  Co-ord  6y*t#m  Limit* 


(O.O)  (10000.9040) 
(0.  0)  ( l 0000.  5840 ) 


Plot 


INSERT  rESEASEE 

GRAPHICS  FOPn 

OVERLAY 

RECORD  VTIlITv 

cancEw 

WINDOW 

split 

SELECT  lavERE 

PESET 

VIEwpORT 

0PTlnl2Ec  OFP 

CD 

ECAlE 

rotate 

DplQT>\El_plC~.  lAvQJT;. wOkhSTaT 1 0N> 

Figure  A. 75 
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